IT Cooking

Success is just one script away

2018 Comparison of Popular Archive Utility

13 min read
2018 comparison of popular archive / backup / compression utility. 7-zip • gzip • rar • tar • zip | Definition of Backup vs Archive | Command line examples | Comparison table

2018 comparison of popular archive / backup / compression utility: tar / ZIP / gzip / RAR / 7-zip

Definition of Backup vs Archive

An archive is a collection of historical records that are kept for long-term retention and used for future reference. Typically, archives contain data that is not actively used.

A backup is a copy of a data set, while an archive holds original data that has been removed from its original location (Dorion 2008).

There is not much difference between the two, so the main point of an archive is to hold data that is meant to be removed from its active state.

 

TAR

Tar is an archive utility, that simply stacks files sequentially from input or as arguments in a single file, with a small payload at the end to store file structure and start/end of each file within the archive. Developed for UNIX in 1979 for tape archive recorders (Deutsch and Aladdin Enterprises 1996). Cannot span files, cannot compress, cannot encrypt, cannot backup unnamed pipes. Compression is achieved on the fly by piping the output with compress or gzip. Modern versions of tar are now linked to local UX compressors such as compress or gzip, and there are many forks such as gtar (GNU tar). It’s confusing because most of the time the tar command keeps the same name and one cannot guess its capabilities unless one prints out its usage with tar --help.

Option arguments do not always require dashes (“Tar(5) — Format Of Tape Archive Files” 2004).

tar usage (truncated)

Usage: tar [OPTION...] [FILE]...
GNU 'tar' saves many files together into a single tape or disk archive, and can
restore individual files from the archive.

Examples:
  tar -cf archive.tar foo bar  # Create archive.tar from files foo and bar.
  tar -tvf archive.tar         # List all files in archive.tar verbosely.
  tar -xf archive.tar          # Extract all files from archive.tar.

 Main operation mode:

  -A, --catenate, --concatenate   append tar files to an archive
  -c, --create               create a new archive
  -d, --diff, --compare      find differences between archive and file system
      --delete               delete from the archive (not on mag tapes!)
  -r, --append               append files to the end of an archive
  -t, --list                 list the contents of an archive
      --test-label           test the archive volume label and exit
  -u, --update               only append files newer than copy in archive
  -x, --extract, --get       extract files from an archive

 Operation modifiers:

      --check-device         check device numbers when creating incremental
                             archives (default)
  -g, --listed-incremental=FILE   handle new GNU-format incremental backup
  -G, --incremental          handle old GNU-format incremental backup
      --ignore-failed-read   do not exit with nonzero on unreadable files
      --level=NUMBER         dump level for created listed-incremental archive
  -n, --seek                 archive is seekable
      --no-check-device      do not check device numbers when creating
                             incremental archives
      --no-seek              archive is not seekable
      --occurrence[=NUMBER]  process only the NUMBERth occurrence of each file
                             in the archive; this option is valid only in
                             conjunction with one of the subcommands --delete,
                             --diff, --extract or --list and when a list of
                             files is given either on the command line or via
                             the -T option; NUMBER defaults to 1
      --sparse-version=MAJOR[.MINOR]
                             set version of the sparse format to use (implies
                             --sparse)
  -S, --sparse               handle sparse files efficiently

 

tar examples

Output file list piped to tar:

<command that produce file list> | tar cf backupfile.tar -

list content of tar file:

tar tvf backupfile.tar

backup recursively a directory:

tar cf backupfile.tar /path/to/backup

backup recursively a directory + gzip max compression:

tar cf - /path/to/compress | gzip -9 > backupfile.tar.gz

backup with gzip and rename directories inside the backup file:

tar zcvf backupfile.tar.gz --transform=s/path2rename/newName/ path2backup

 

ZIP

Zip is created in 1989 by Phil Katz to replace other concurrent formats such as ARC, traditionally uses DEFLATE compression (just like gzip). It has been greatly developed, maintained, and openly documented by PKWARE (PKWARE 2017). It is a de facto industry standard, and handles now more algorithms, supports file spanning and encryption (Zip-Crypto and AES). It’s the most popular file format used around the world, because DEFLATE is so fast when creating archives. It has been implemented in OS like MacOS and Windows to create on-the-fly compressed directories in their explorer. It’s primary behavior makes it a backup utility (Adler 2008).

Algorithms:

  • Deflate Standard LZ77-based algorithm
  • Deflate64 Standard LZ77-based algorithm
  • BZip2 Standard BWT algorithm
  • PPMD Dmitry Shkarin’s PPMdH with small changes
  • LZMA Improved and optimized version of LZ77 algorithm

zip usage

UX zip utility is divided in 2 binaries: zip to compress, unzip to decompress:

Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete OS files)
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation/print version info
  -c   add one-line comments        -z   add zipfile comment
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
  -T   test zipfile integrity       -X   eXclude eXtra file attributes
  -y   store symbolic links as the link instead of the referenced file
  -e   encrypt                      -n   don't compress these suffixes
  -h2  show more help

 

unzip usage

UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
  Default action is to extract files in list, except those in xlist, to exdir;
  file[.zip] may be a wildcard.  -Z => ZipInfo mode ("unzip -Z" for usage).

  -p  extract files to pipe, no messages     -l  list files (short format)
  -f  freshen existing files, create none    -t  test compressed archive data
  -u  update files, create if necessary      -z  display archive comment only
  -v  list verbosely/show version info       -T  timestamp archive to latest
  -x  exclude files that follow (in xlist)   -d  extract files into exdir
modifiers:
  -n  never overwrite existing files         -q  quiet mode (-qq => quieter)
  -o  overwrite files WITHOUT prompting      -a  auto-convert any text files
  -j  junk paths (do not make directories)   -aa treat ALL files as text
  -U  use escapes for all non-ASCII Unicode  -UU ignore any Unicode fields
  -C  match filenames case-insensitively     -L  make (some) names lowercase
  -X  restore UID/GID info                   -V  retain VMS version numbers
  -K  keep setuid/setgid/tacky permissions   -M  pipe through "more" pager
See "unzip -hh" or unzip.txt for more help.  Examples:
  unzip data1 -x joe   => extract all files except joe from zipfile data1.zip
  unzip -p foo | more  => send contents of foo.zip via pipe into program more
  unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer

 

zip examples

Create an archive:

zip archive.zip file(s)

Create an archive recursively

zip -r archive.zip directory(s)

List archive content:

unzip -l archive.zip

Check integrity of an archive:

zip -T archive.zip

unzip -t archive.zip

Extract an archive:

unzip archive.zip

Update (refresh) files inside an archive:

zip -u archive.zip file(s)

Zip list of files returned from the find command:

find . -name "pattern" -print | zip archive.zip -@

 

gzip

Gzip is an inline compression utility, that compresses redirects from terminal, or that compresses files in place. Algorithm used is DEFLATE, developed in 1992 to replace the compress program from early UNIX systems (J.-L. Gailly 2003). “G” stands for GNU, superseded by the Free Software Movement (Free Software Foundation 2018). Because of its on-the-fly abilities, this format is used today as the standard HTTP compression offered by every web servers available today (J. Gailly 2017). It doesn’t supports file spanning. It’s primary behavior (in-place compression) makes it an archive utility.

Algorithms:

  • Deflate Standard LZ77-based algorithm

 

gzip Usage

Usage: gzip [OPTION]... [FILE]...
Compress or uncompress FILEs (by default, compress FILES in-place).

Mandatory arguments to long options are mandatory for short options too.

  -c, --stdout      write on standard output, keep original files unchanged
  -d, --decompress  decompress
  -f, --force       force overwrite of output file and compress links
  -h, --help        give this help
  -k, --keep        keep (don't delete) input files
  -l, --list        list compressed file contents
  -L, --license     display software license
  -n, --no-name     do not save or restore the original name and time stamp
  -N, --name        save or restore the original name and time stamp
  -q, --quiet       suppress all warnings
  -r, --recursive   operate recursively on directories
  -S, --suffix=SUF  use suffix SUF on compressed files
  -t, --test        test compressed file integrity
  -v, --verbose     verbose mode
  -V, --version     display version number
  -1, --fast        compress faster
  -9, --best        compress better
  --rsyncable       Make rsync-friendly archive

With no FILE, or when FILE is -, read standard input.

 

gzip Examples

Compress a file in place and delete original file (produces file.gz):

gzip file

Compress file to file.gz and keep original:

gzip -k file

gzip -c file > file.gz

Decompress an archive and remove it:

gzip -d file.gz

Compress output of an sqldump into a gziped file:

mysqldump --opt <database> | gzip -c > database.sql.gz

 

RAR

Another backup compression utility developed by a Russian engineer in 1993, with a proprietary, licensed algorithm (win.rar GmbH 2018). It’s been updated over time so it can decompress a variety of format. Supports encryption and file spanning (win.rar 2018). On UX systems, rar utilities are split in two: while unrar is publically available, installing rar requires an additional proprietary repository, which few users care about since xz and 7zip are free and preferred. Because of its license, rar usage is clearly plummeting overall.

Algorithms:

  • RAR proprietary
  • LZSS Lempel-Ziv
  • Deflate Standard LZ77-based algorithm

 

Unrar usage

UNRAR 5.30 beta 4 freeware      Copyright (c) 1993-2015 Alexander Roshal

Usage:     unrar <command> -<switch 1> -<switch N> <archive> <files...>
               <@listfiles...> <path_to_extract\>

<Commands>
  e             Extract files without archived paths
  l[t[a],b]     List archive contents [technical[all], bare]
  p             Print file to stdout
  t             Test archive files
  v[t[a],b]     Verbosely list archive contents [technical[all],bare]
  x             Extract files with full path

<Switches>
  -             Stop switches scanning
  @[+]          Disable [enable] file lists
  ad            Append archive name to destination path
  ag[format]    Generate archive name using the current date
  ai            Ignore file attributes
  ap<path>      Set path inside archive
  c-            Disable comments show
  cfg-          Disable read configuration
  cl            Convert names to lower case
  cu            Convert names to upper case
  dh            Open shared files
  ep            Exclude paths from names
  ep3           Expand paths to full including the drive letter
  f             Freshen files
  id[c,d,p,q]   Disable messages
  ierr          Send all messages to stderr
  inul          Disable all messages
  kb            Keep broken extracted files
  n<file>       Additionally filter included files
  n@            Read additional filter masks from stdin
  n@<list>      Read additional filter masks from list file
  o[+|-]        Set the overwrite mode
  ol[a]         Process symbolic links as the link [absolute paths]
  or            Rename files automatically
  ow            Save or restore file owner and group
  p[password]   Set password
  p-            Do not query password
  r             Recurse subdirectories
  sc<chr>[obj]  Specify the character set
  sl<size>      Process files with size less than specified
  sm<size>      Process files with size more than specified
  ta<date>      Process files modified after <date> in YYYYMMDDHHMMSS format
  tb<date>      Process files modified before <date> in YYYYMMDDHHMMSS format
  tn<time>      Process files newer than <time>
  to<time>      Process files older than <time>
  ts<m,c,a>[N]  Save or restore file time (modification, creation, access)
  u             Update files
  v             List all volumes
  ver[n]        File version control
  vp            Pause before each volume
  x<file>       Exclude specified file
  x@            Read file names to exclude from stdin
  x@<list>      Exclude files listed in specified list file
  y             Assume Yes on all queries

 

Winrar examples

On windows, installing the Winrar software also gives access to some command line utilities that can compress and decompress using RAR algorithm. unrar command uses the same arguments as on UX systems:

Create a zip backup:

winrar a -afzip backup.zip file(s)

Create a rar backup:

winrar a -r backup.rar file(s)

Create a rar backup recursively:

winrar a -r backup.rar path

Test backup integrity:

unrar t backup.rar

List backup content:

unrar va backup.rar

Extract specific files in a backup file + directory structure:

unrar x backup.rar *.ext [extractfolder\]

Extract backup without directory structure:

unrar e backup.rar [extractfolder\]

 

7-zip

7zip is a modern, free (Pavlov 2018a) backup compression utility developed in 1999 by another Russian engineer. It uses a variety of algorithms to compress and decompress many backup formats including tar. It can only unRAR archives due to licensing (Pavlov 2018b). Its flagship algorithm is LZMA, and today LZMA2 (Pavlov 2018c). It offers the best compression ratio available on the market, at the cost of speed though. It uses a huge dictionary size with new coding techniques, which explains the ratio, and also why it’s so slow while compressing. Surprisingly, decompression is almost as fast as DEFLATE. Supports file spanning and AES encryption.

One can install p7zip for Linux, but there exists a more common utility called xz that also uses LZMA/LZMA2 compression (Collin 2016). Just like gzip, it compresses data streams: no ability to store multiple files (Himanshu 2012).

Algorithms:

  • LZMA2 Improved version of LZMA
  • LZMA Improved and optimized version of LZ77 algorithm
  • PPMD Dmitry Shkarin’s PPMdH with small changes
  • BZip2 Standard BWT algorithm
  • BCJ Converter for 32-bit x86 executables
  • BCJ2 Converter for 32-bit x86 executables
  • Deflate Standard LZ77-based algorithm

p7zip: Differences between 7z, 7za and 7zr binaries

The package p7zip usually includes three binaries, 7z, 7za, and 7zr (Vinet and Griffin 2017). Their differences are:

  • 7z: 7z uses plugins to handle archives.
  • 7za: is a stand-alone executable. 7za handles fewer archive formats than 7z, but does not need any other plugin.
    • 7zr: is a stand-alone executable. 7zr does not need any other plugin, and is a light-version of 7za that only handles 7z archives.

 

7zip Usage

7-Zip (a) 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03
p7zip Version 9.38.1 (locale=C,Utf16=off,HugeFiles=on,2 CPUs)

Usage: 7za <command> [<switches>...] <archive_name> [<file_names>...]
       [<@listfiles...>]

<Commands>
  a : Add files to archive
  b : Benchmark
  d : Delete files from archive
  e : Extract files from archive (without using directory names)
  h : Calculate hash values for files
  l : List contents of archive
  rn : Rename files in archive
  t : Test integrity of archive
  u : Update files to archive
  x : eXtract files with full paths
<Switches>
  -- : Stop switches parsing
  -ai[r[-|0]]{@listfile|!wildcard} : Include archives
  -ax[r[-|0]]{@listfile|!wildcard} : eXclude archives
  -bd : Disable percentage indicator
  -i[r[-|0]]{@listfile|!wildcard} : Include filenames
  -m{Parameters} : set compression Method
  -o{Directory} : set Output directory
  -p{Password} : set Password
  -r[-|0] : Recurse subdirectories
  -scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files
  -sfx[{name}] : Create SFX archive
  -si[{name}] : read data from stdin
  -slt : show technical information for l (List) command
  -so : write data to stdout
  -ssc[-] : set sensitive case mode
  -t{Type} : Set type of archive
  -u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options
  -v{Size}[b|k|m|g] : Create volumes
  -w[{path}] : assign Work directory. Empty path means a temporary directory
  -x[r[-|0]]]{@listfile|!wildcard} : eXclude filenames
  -y : assume Yes on all queries

 

Examples

Create a backup:

7z a backup.7z archiveDir

List backup content:

7z l backup.7z

Check integrity of a backup:

7z t backup.7z

Extract a backup:

7z e backup.7z

Update (refresh) files inside a backup:

7z u backup.7z backupDir

Create a max compression LZMA backup with tar and xz on the fly:

tar -cf - path/ | xz -9 -c - > archive.tar.xz

 

Comparison

A thorough comparison is indeed found on Wikipedia, but here is a quick facts comparison table, based on my personal experience:

Name Recommended Extension Algorithm Cost Compression speed Compression ratio
Tar (original) tar None Free Fastest None
Zip zip DEFLATE Free Fast Average
Gzip Gz DEFLATE Free Fast Average
Rar Rar RAR Licensed Slow Best
7-zip

xz

7z

xz

LZMA2 Free Slowest Bestest

Notes:

  • Compression ratios are subjective to the data compressed: none of these software do good on media files for instance.
  • The comparison above is valid between the utilities mentioned only.
  • tar, and at some extent, any compression software, actually do not require a file extension. However it’s best practice to use them for good file management.

 

Works Cited

Adler, Mark. 2008. “Zip: Package And Compress (Archive) Files.” PTC Inc. https://www.mkssoftware.com/docs/man1/zip.1.asp.
Collin, Lasse. 2016. “The .Xz File Format.” Tukaani. December 30. https://tukaani.org/xz/format.html.
Deutsch, P, and Inc. Aladdin Enterprises. 1996. “DEFLATE Compressed Data Format Specification Version 1.3.” Request for Comments. May. https://tools.ietf.org/html/rfc1951.
Dorion, Pierre. 2008. “Backup vs. Archive.” TechTarget. October. https://searchdatabackup.techtarget.com/tip/Backup-vs-archive.
Free Software Foundation. 2018. “GNU Operating System.” Free Software Foundation Inc. March. https://www.gnu.org/.
Gailly, Jean-Loup. 2003. “The Gzip Home Page.” Gzip.Org. July 27. http://www.gzip.org/.
Gailly, Jean-loup. 2017. “Zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library.” Zlib. January 15. https://www.zlib.net/.
Pavlov, Igor. 2018a. “7-Zip Copyright (C) 1999-2018 Igor Pavlov.” 7-Zip. https://www.7-zip.org/license.txt.
———. 2018b. “7-Zip Format.” 7-Zip. https://www.7-zip.org/7z.html.
———. 2018c. “History of the 7-Zip.” 7-Zip. March 4. https://www.7-zip.org/history.txt.
PKWARE, Inc. 2017. “ZIP File Format (PKWARE).” Digital Preservation at the Library of Congress. July 27. https://www.loc.gov/preservation/digital/formats/fdd/fdd000354.shtml.
“Tar(5) — Format Of Tape Archive Files.” 2004. FreeBSD File Formats Manual. May 20. https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+7.0-RELEASE.
Vinet, Judd, and Aaron Griffin. 2017. “P7zip.” Arch Linux. December 5. https://wiki.archlinux.org/index.php/P7zip.
win.rar, GmbH. 2018. “RAR 5.0 Archive Format.” Win.Rar GmbH. https://www.rarlab.com/technote.htm.
win.rar GmbH. 2018. “Rar And Winrar End User License Agreement (Eula).” Rarlab GmbH. https://www.rarlab.com/license.htm.

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright IT Cooking© All rights reserved. | Production by Doctus IT LLC.