Digitizing Artifacts: Digital image file types explained

Digitzing Artifacts: Digital image file types explainedThere are many file types used to encode digital images. The choices are similar but have different characteristics and are best suited for specific applications.  You will often hear file types referred to as “Loosy” or “Lossless” compression.

Lossless  is a compression technique that decompresses data back to its original form without any loss. The decompressed file and the original are identical. All compression methods used to compress text, databases and other business data are lossless. For example, the ZIP archiving technology (PKZIP, WinZip, etc.) is a widely used lossless method.

Lossy is a compression technique that does not decompress data back to 100% of the original. Lossy methods provide high degrees of compression and result in very small compressed files, but there is a certain amount of loss when they are restored. Audio, video and some imaging applications can tolerate loss, and in many cases, it may not be noticeable to the human ear or eye. In other cases, it may be noticeable, but not that critical to the application. The more tolerance for loss, the smaller the file can be compressed, and the faster the file can be transmitted over a network. Examples of lossy file formats are MP3, AAC, MPEG and JPEG.

The most common file types that you will see referred to most often in digitizing are BMP, GIF, JPG, PNG, RAW, and TIFF. The following is a brief overview of each.

BMP is an uncompressed proprietary format invented by Microsoft. There is really no reason to ever use this format.

GIF creates a table of up to 256 colors from a pool of 16 million. If the image has fewer than 256 colors, GIF can render the image exactly. When the image contains many colors, software that creates the GIF uses any of several algorithms to approximate the colors in the image with the limited palette of 256 colors available. Better algorithms search the image to find an optimum set of 256 colors. Sometimes GIF uses the nearest color to represent each pixel, and sometimes it uses “error diffusion” to adjust the color of nearby pixels to correct for the error in each pixel.

GIF achieves compression in two ways. First, it reduces the number of colors of color-rich images, thereby reducing the number of bits needed per pixel, as just described. Secondly, it replaces commonly occurring patterns (especially large areas of uniform color) with a short abbreviation; instead of storing “white, white, white, white, white,” it stores “5 white.”

Thus, GIF is “lossless” only for images with 256 colors or less. For a rich, true color image, GIF may “lose” 99. % of the colors.

JPG is optimized for photographs and similar continuous tone images that contain many, many colors. It can achieve astounding compression ratios even while maintaining very high image quality. GIF compression is unkind to such images. JPG works by analyzing images and discarding kinds of information that the eye is least likely to notice. It stores information as 24-bit color. Note: The degree of compression for a JPG is adjustable. At moderate compression levels of photographic images, it is very difficult for the eye to discern any difference from the original, even at extreme magnification. Compression factors of more than 20 are often quite acceptable. Better graphics programs, such as Paint Shop Pro and Photoshop, allow you to view the image quality and file size as a function of compression level, so that you can conveniently choose the balance between quality and file size.

PNG is also a lossless storage format. However, in contrast with common TIFF usage, it looks for patterns in the image that it can use to compress file size. The compression is exactly reversible, so the image is recovered exactly. PNG is superior to GIF. It produces smaller files and allows more colors. PNG also supports partial transparency. Partial transparency can be used for many useful purposes, such as fades and antialiasing of text. Unfortunately, Microsoft’s Internet Explorer does not properly support PNG transparency, so for now Web authors must avoid using transparency in PNG images or direct their users to Mozilla or Firefox browsers. PNG is of principal value in two applications:

  • If you have an image with large areas of exactly uniform color, but contains more than 256 colors, PNG is your choice. Its strategy is similar to that of GIF, but it supports 16 million colors, not just 256.
  • If you want to display a photograph exactly without loss on the Web, PNG is your choice. Later generation Web browsers support PNG, and PNG is the only lossless format that Web browsers support.

RAW is an image output option available on some digital cameras. Though lossless, it is a factor of three of four smaller than TIFF files of the same image. The disadvantage is that there is a different RAW format for each manufacturer, and so you may have to use the manufacturer’s software to view the images. (Some graphics applications can read some manufacturer’s RAW formats.)   Use RAW only for in-camera storage, and copy or convert to TIFF, PNG, or JPG as soon as you transfer to your PC. You do not want your image archives to be in a proprietary format. Although several graphics programs can now read the RAW format for many digital cameras, it is unwise to rely on any proprietary format for long term storage. Will you be able to read a RAW file in five years? In twenty? JPG is the format most likely to be readable in fifty years. Thus, it is appropriate to use RAW to store images in the camera and perhaps for temporary lossless storage on your PC, but be sure to create a TIFF, or better still a PNG or JPG, for archival storage.

TIFF is, in principle, a very flexible format that can be lossless or lossy. The details of the image storage algorithm are included as part of the file. In practice, TIFF is used almost exclusively as a lossless image storage format that uses no compression at all. Most graphics programs that use TIFF do not use compression. Consequently, file sizes are quite big. (Sometimes a lossless compression algorithm called LZW is used, but it is not universally supported.)

In addition to the file types previously described, you also see proprietary formats such as PSD and PSP used by graphic programs and PDF:

  • The files in Photoshop have the PSD extension, while Paint Shop Pro files use PSP. These are the preferred working formats as you edit images in the software, because only the proprietary formats retain all the editing power of the programs. These packages use layers, for example, to build complex images, which allow images to be rearranged under and over each other for placement.  This information may be lost in the nonproprietary formats such as TIFF and JPG. However, be sure to save your end result as a standard TIFF or JPG, or you may not be able to view it in a few years when your software has changed.
  • Portable Document Format (PDF) is an open file format created by Adobe Systems. It is used for representing two-dimensional documents in a device independent and resolution independent fixed-layout document format. Each PDF file encapsulates a complete description of a document that includes the text, fonts, images, and graphics that compose the document. PDF files do not encode information that is specific to the application software, hardware, or operating system used to create or view the document. This feature ensures that a valid PDF will render exactly the same regardless of its origin or destination.  PDF files are most appropriately used to encode the exact look of a document in a device-independent way. While PDF can describe very simple one page documents, it may also be used for many pages, complex documents that use a variety of fonts, graphics, colors, and images.

Another reason for the many file types is that images differ in the number of colors they contain. If an image has few colors, a file type can be designed to exploit this as a way of reducing file size.

Related posts

Join Genealogy by Barry Newsletter