File formats on the Internet

3
Copyright 8,; 1995 El&ier Science Ltd Printed in Great Britain. All rights reserved 009x-3004/95 $9.50 + 0.00 FILE FORMATS ON THE INTERNET RODNEY PERKINS Texas Center for Superconductivity, University of Houston, Houston, TX 77204, U.S.A. e-mail rperkins(aiuh.edu (Received 25 October 1994; accepted I December 1994) Abstract-The Internet undoubtedly is one of the best and most comprehensive sources of information in the world. With an ever expanding variety of navigational tools, the watchful user can locate applications, graphics, and text to fit almost any need. To make optimum use of these resources, however, requires a knowledge of a variety of file formats, the ways in which files are encoded and compressed, and the programs that allow you to access and use the files Key Words: Internet, Files ENCODING, DECODING, AND COMPRESSION OF DOWNLOADED FILES Files available for downloading on the Internet usually are encoded and compressed. Encoding allows the user to transfer files through electronic mail or other programs which were not designed to handle binary data. Encoding also allows files to be transferred between computers of different types with a minimum of confusion. A conversion program will turn the &bit data into a 7-bit ASCII text file that can be transmitted to another user’s machine. With the proper program, the file then can be decoded into an executable application (Engst, 1993, p. 88-91). Files are compressed to save storage space on host computers and speed transfer of files between computers. Encoding formats include uuencode, Binhex, and MIME (Table 1). Uuencode (.uu), originates from the Unix operating system and can be translated using a program termed uudecode. Binhex (.hqx) is native to Macintosh machines and can be translated (along with numerous other formats) using Stuffit Expander. MIME, which stands for Multi-Purpose Mail Extensions, is a set of cross-plat- form standards that allows a user to add features such as PostScript’ language, digital audio, and graphics to an electronic mail message (Sweet, 1994). These files can be decoded using a MIME-capable e-mail program such as Eudora (Mac and PC versions) or any other multi-user agent (MUA). World Wide Web (WWW) browsers use MIME extensions to recognize downloads of different types and launch appropriate external applications. For example, after downloading a file compress< I with a Stuffit (sit), a browser will recognize the extension and automatically launch the program that has been set to decode and decompress it. In the area of file compression, Unix uses a program termed Compress (.Z). Files compressed with this program may be in combination with the tape archive format (.tar.Z) and in some situations are uuencoded (.tar.Z.uu). Some form of Compress also is available for MS-DOS, Macintosh, and Amiga systems (Gailly, 1994). PKZIP (.zip) and LHARC (.arc) are the two main compression programs for DOS and Windows platforms. Macintosh files usually are archived with Compact Pro (.cpt) or Stuffit (.sit). Many programs are designed to decompress and decode numerous file formats so it may be unnecessary to use more than one program for these purposes. GRAPHICS FILE FORMATS As Internet services move away from the command line and towards graphical user interfaces, graphic file formats become increasingly important. JPEG (Joint Photographic Experts Group) and Compuserve’s GIF (Graphics Interchange Format) are types of graphics files on the Internet. JPEG is a “lossy” compression in which unnecessary data is removed from an image in order to reduce its size (Lane, 1994). Because lossy compression can contribute to a noticeable degra- dation in image quality, the amount of data removed can be determined by the user. JPEG is best used for compressing continuous-tone images such as photo- graphs (McLelland, 1993, p. 143-144). A JPEG-com- pressed image can store up to 24 bits of information (16 million colors). GIF uses a “loseless” compression (specifically LZW) in which no data are removed from the image. It can handle up to 8 bits of information (256 colors or less). GIF is designed for producing the most compacted file (McLelland, 1993, p. 136). Of special interest is the subject of interlaced GIFs. One of the most attractive features of the WWW and its Hypertext Markup Language (HTML) is its ability to present text and graphics in a fashion similar to a desktop published page. When GIF files are 775

Transcript of File formats on the Internet

Page 1: File formats on the Internet

Copyright 8,; 1995 El&ier Science Ltd Printed in Great Britain. All rights reserved

009x-3004/95 $9.50 + 0.00

FILE FORMATS ON THE INTERNET

RODNEY PERKINS

Texas Center for Superconductivity, University of Houston, Houston, TX 77204, U.S.A. e-mail rperkins(aiuh.edu

(Received 25 October 1994; accepted I December 1994)

Abstract-The Internet undoubtedly is one of the best and most comprehensive sources of information in the world. With an ever expanding variety of navigational tools, the watchful user can locate applications, graphics, and text to fit almost any need. To make optimum use of these resources, however, requires a knowledge of a variety of file formats, the ways in which files are encoded and compressed, and the programs that allow you to access and use the files

Key Words: Internet, Files

ENCODING, DECODING, AND COMPRESSION OF DOWNLOADED FILES

Files available for downloading on the Internet usually are encoded and compressed. Encoding allows the user to transfer files through electronic mail or other programs which were not designed to handle binary data. Encoding also allows files to be transferred between computers of different types with a minimum of confusion.

A conversion program will turn the &bit data into a 7-bit ASCII text file that can be transmitted to another user’s machine. With the proper program, the file then can be decoded into an executable application (Engst, 1993, p. 88-91). Files are compressed to save storage space on host computers and speed transfer of files between computers. Encoding formats include uuencode, Binhex, and MIME (Table 1). Uuencode (.uu), originates from the Unix operating system and can be translated using a program termed uudecode. Binhex (.hqx) is native to Macintosh machines and can be translated (along with numerous other formats) using Stuffit Expander. MIME, which stands for Multi-Purpose Mail Extensions, is a set of cross-plat- form standards that allows a user to add features such as PostScript’ language, digital audio, and graphics to an electronic mail message (Sweet, 1994). These files can be decoded using a MIME-capable e-mail program such as Eudora (Mac and PC versions) or any other multi-user agent (MUA). World Wide Web (WWW) browsers use MIME extensions to recognize

downloads of different types and launch appropriate external applications. For example, after downloading a file compress< I with a Stuffit (sit), a browser will recognize the extension and automatically launch the program that has been set to decode and decompress it.

In the area of file compression, Unix uses a program termed Compress (.Z). Files compressed with this

program may be in combination with the tape archive format (.tar.Z) and in some situations are uuencoded (.tar.Z.uu). Some form of Compress also is available for MS-DOS, Macintosh, and Amiga systems (Gailly, 1994). PKZIP (.zip) and LHARC (.arc) are the two main compression programs for DOS and Windows platforms. Macintosh files usually are archived with Compact Pro (.cpt) or Stuffit (.sit). Many programs are designed to decompress and decode numerous file formats so it may be unnecessary to use more than one program for these purposes.

GRAPHICS FILE FORMATS

As Internet services move away from the command line and towards graphical user interfaces, graphic file formats become increasingly important. JPEG (Joint Photographic Experts Group) and Compuserve’s GIF (Graphics Interchange Format) are types of graphics files on the Internet. JPEG is a “lossy” compression in which unnecessary data is removed from an image in order to reduce its size (Lane, 1994). Because lossy compression can contribute to a noticeable degra- dation in image quality, the amount of data removed can be determined by the user. JPEG is best used for compressing continuous-tone images such as photo- graphs (McLelland, 1993, p. 143-144). A JPEG-com- pressed image can store up to 24 bits of information (16 million colors). GIF uses a “loseless” compression (specifically LZW) in which no data are removed from

the image. It can handle up to 8 bits of information (256 colors or less). GIF is designed for producing the most compacted file (McLelland, 1993, p. 136). Of special interest is the subject of interlaced GIFs. One of the most attractive features of the WWW and its Hypertext Markup Language (HTML) is its ability to present text and graphics in a fashion similar to a desktop published page. When GIF files are

775

Page 2: File formats on the Internet

776 R. Perkins

transferred from a server to a user’s browser, they are loaded from top to bottom. The Netscape browser supports the use of interlaced GIFs. These files appear layer-by-layer when being transferred, providing a more visually appealling effect than regular GIFs. Many graphics programs support the option of saving GIF files in an interlaced format.

One point to consider when working with images intended for the Internet is that many computers (such as those using VGA monitors) do not support the display of more than 256 colors. Some computers can dither automatically an image to suit the resolution of the screen. In other situations, images must be designed specifically for low-resolution screens. This can be achieved by using an image-processing program such as Adobe Photoshop or NIH Image (which will be discussed later in this text) to convert a l6- or 24-bit image to 8 bits. Photoshop uses a command termed “Indexed Color” whereas NIH Image automatically reduces images to 8 bits. Using these programs, the desired image can be mapped to a palette of 256 colors. Each pixel will be assigned the color from the palette that best fits the original. A look-up table or LUT will

be created, containing the closest approximations of the original colors (McLelland, 1993, p. 531-533). This look-up table is a part of the image and will follow it if placed in other programs. The image then can be dithered to effectively distribute the colors across the image (MacFarland, 1994, p. 12). For more detailed information, O’Reilley and Associates has published The Encyclopedia (?f Graphic File Formats (ISBN I-56592-058-9) which contains specifications for 100 different graphic file formats. The book also includes a CD-ROM (ISO- format) which contains public domain image viewing and processing utilities for MS-DOS, OS/2, Unix, and Macintosh operating systems.

IMAGE-PROCESSING SOFTWARE ON INTERNET

Most shareware and freeware graphics programs are designed for viewing images or converting them

from one format to another. Three programs of note are JPEG View (Macintosh), Graphics Converter (Macintosh), and LView Pro (MS-Windows). JPEG View can be used to view and do limited manipulation of JPEG-compressed pictures. In addition to having an attractive set of paint and image-editing tools, Graphics Converter can read or write dozens of graphic formats including ones such as TIFF, JPEG, GIF, and PICT. The program will read XBM (X-Windows bitmap), SUN (Sun workstations), Atari and other formats. JPEG View and Graphics Converter can be obtained at “gopher://sumex aim.stanford.edu/info-mat”. This archive is an excellent resource for Macintosh shareware and freeware. LView Pro can read and write JPEG, GIF, and BMP formats. LView Pro ~1.9, along with numerous other MS-DOS and Windows programs, can be obtained at “gopher://micros.hensas.ac.uk/mi- cros/ibmpc”.

Some programs provide a range of image-process- ing capabilities comparable to commercial products. NIH (National Institute of Health) Image for the Macintosh can be considered easily a shareware counterpart to Adobe Photoshop, the most popular and powerful image-processing package on the current market. NIH Image provides many of Photoshop’s features including plug-in filters. It also has features Photoshop does not have such as the ability to digitize individual picture frames or entire movie files. NIH Image supports TIFF, PICT, PIGS, and MacPaint file formats. NIH Image supports only &bit information. It will open 16- and 24-bit images (except for 24-bit TIFF) but automatically indexes them to 8 bits. It support 24 bits only when digitizing color pictures. Otherwise, users must work with LUTs. NIH Image does not support alpha channels (an extra 8 bits of color information beyond red, green, and blue that can be used for masking or producing composite images) but provides a similar function through a feature termed stacks. NIH Image is available at “ftp://zippy.nimh.nih.govjpub/nih- image”.

Table I. Internet files

Extension File format

.arc

.bmp

.cpt

.gif

.hqx

.jp&s Ax MIME .mov, .moov

mpeg, mpg .sit .tar .tiff .uu .xbm .Z

.zip

LHARC MS Windows Bitmap Compact Pro Graphic Interchange Format Binhex Joint Photographers Expert Group Multi-Purpose Internet Mail Extensions Quicktime Motion Pictures Engineers Group stuffit Tape Archive Tag Interchange File Format UUencode X-Windows Bitmap Compress PKZIP

Description

Compression Graphics format Compression Graphic format Encoding scheme Graphics format Encoding scheme Digital video Digital video Compression

Graphics format Encoding scheme Graphics format Compression Compression

Platform

DOS MS Windows Macintosh ALL Macintosh ALL ALL Macintosh,MS Windows ALL Macintosh Unix Macintosh, MS Windows Unix X-Windows (Unix) Unix DOS

Page 3: File formats on the Internet

File formats on the Internet 777

DIGITAL VIDEO MPEG compression differs from other schemes in that

Digital video files on the Internet are usually in one in addition to being compressed spatially and

of two formats: Quicktime (.mov) or MPEG (.mpg). temporally, MPEG frames are compressed bidirec-

Similar to their single frame counterparts, these two tionally. Bidirectional frames (known as B-frames) are

formats apply compression algorithms to reduce file compressed based on motion of a previous frame and

size. Compressing video files is important especially as the future frame. MPEG compression is complicated

they are considerably larger than other types, taking enough to necessitate dedicated hardware. An archive

up 30 megabytesjsec for uncompressed video (Flo- at “ftp://ftp.crs4.it/mpeg” contains a variety of

rence, 1994, p. 283). MPEG audio and video software for DOS, Windows,

In addition to being compressed within frames, Macintosh, and Unix systems.

moving picture files are compressed temporally or As there are many factors that contribute to the

between frames. The algorithim will look for production of digital video, the quality may be poor.

information that repeats itself between frames (such as Choppy playback, audio and video dropouts, and low

static backgrounds) and remove it from consecutive resolution plague many of these films, including those

frames. Compression can be of two different types: on professionally produced CD-ROMs (Florence,

symmetrical and asymmetrical. Symmetric com- 1994). Factors to consider when preparing digital video

pression takes the same amount of time to compress include the capture rate or frames per second (fps), type

and decompress a file. Asymmetric compression takes ofcompression algorithims used, and the speed of both

longer to compress but creates a file with more data, the CPU and hard drive to which you are capturing.

hence providing smoother playback. Files are decompressed everytime they are played.

CONCLUSION

Quicktime is the set of software components that Although some of what has been discussed has been

handles the compression and decompression, record- specific to Macintosh computers, users can obtain

ing and playback of digital video and audio for similar programs for most platforms. It also is

Macintosh computers. Quicktime has four main parts: important to remember that most graphic or text files

the system software, compressors, file formats, and can be viewed across different platforms. As more

human interface (Jerram and Gosney, 1993, p. 69). users gain access to the Internet, the demand for

Quicktime supports numerous types of compression/ cross-platform file formats and programs grows. This

decompression schemes (termed codecs) including move should benefit us all. For those interested in

JPEG, Supermac’s Cinepak, and most recently, more information, the Frequently Asked Questions

MPEG. Quicktime is implemented through software, (FAQ) files listed in the References are an excellent

allowing the flexibility to add new codecs and expand resource (see Adler, 1994; Grieggs, 1994). They also

other features. Its basis in software, however, causes contain pointers to other documents, books, articles.

performance problems for slower machines. Quick- and programs.

time will not run on computers with 68000 processors and runs slowly on 68020- and 68030-based machines. REFERENCES

For effective use of Quicktime, a computer with a Adler, M., 1994, MPEG (Motion Pictures Expert Group) 68040 or RISC processor is required. Quicktime 1.6.2 frequently asked questions: ftp://ftp.cs.tu-berlin.de/pub/

is available at “ftp://ftp.support.com/pub/Apple SW msdos/dos/graphics/mpegfaq 11.20 and 3O.zip (MS-DOS

updates,‘Macintosh/Supplemental System Software/ format).

quicktimel.6.2.sea.hqx”. Quicktime I. 1.1 for Engst, A., 1993, Internet starter kit for Macintosh: Hayden,

Indianapolis, 641 p. Microsoft Windows is available at “ftp:// Florence, M., 1994. Power programming: bring programs to ftp.support.com/pub/Apple SW updates/Macintosh/ life with digital video: PC Magazine, v. 13. no. 9,

Supplemental System Software/Quicktime for p. 283-292.

Windows (l.l.l).hqx”. Quicktime 2.0 can be pur- Gailly, J. L., 1994, Comp.Compression frequently asked

questions: ftp://rtfm.mit.edu~pub/usenet/news.answers/ chased as part of Apple’s System 7.5 or as part of comuression-f aaioarts I-3. ASCII text. 231K (total). commercial software packages such as Adobe Grieggs: J., 1994, ‘Graphics frequently asked questions:

Premiere 4.0. Quicktime 2.0 is not available on the ftp://rtfm.mit.edu/pub/usenet/news.answers/graphics~

Internet or through other on-line services. faq. ASCII text, 52K.

Jerram, P., and Gosney, M., 1993, Multimedia power tools: MPEG (Motion Picture Experts Group) is the Random House, Nkw York, 640 p.

international standard for compressing digital video Lane. T.. 1994. JPEG (Joint Photoeraohers Exoerts Grouo)

and audio. MPEG has been implemented in phases, frequently asked questions: - f;p://rtfm.mit.edu/pubj

each of which provides compression for different usenet/news.answers/jpeg-faq, ASCII text, 62K.

purposes. MPEG is currently in Phase-2, with Phase-4 MacFarland, T., 1994, Morph’s workshop: dither away the

time: Morph’s Outpost, v. 2, no. 1, p. 12-13. (low bit rate applications) in development. Phase-3 McClelland, D., 1993, MacWorld photoshop 2.5 bible: IDG,

originally was designed for High-Definition Television San Mateo, California, 685 p.

but since has been abandoned. MPEG compression Sweet, J., 1994, MIME (Multi-Purpose Mail Extensions)

occurs in three different forms: video, audio and frequently asked questions: ftp://rtfm.mit.edu/pub/ usenet-by group/news.answers/mail/mime-faq l-3.

interleaved (video and audio combined) (Adler, 1994). ASCII text, 99K.