When you look at a printed document of whatever sort you will notice that the page is made up either of text, images, or both. You can identify what is text and what isn't because you know what text looks like and you know how to read text.

The computer doesn't work that way. When you have an electronic version of a document it can look exactly the same as the printed copy that you just looked at with text and/or images. What you see as images, the computer also sees as images but what you see as text the computer may see as text but it may instead see the text as an image. This is because the computer uses different methods to store text and images and the method that is used to store images can also store text.

So why don't we store all of our page content as images rather than as text? We don't because the computer can store text far more efficiently than it can store images. Text stored as images would take up much more space than the equivalent text stored as text. Also the text editors on your computer (word processing, dtp software, etc.) work with text stored as text. You need a graphics program to edit images and as you can tell from my article on Replacing Text in a Graphic the process of editing text stored as images is also a relatively involved process.

Given the larger space required for storing text as images and the difficulty in editing it, the question now arises as to why we would ever want to store text as images rather than as text? There are several reasons I can think of why you would store text in images.

The most obvious is that your image requires labels on various parts of the image in order for the image to be properly understood. It is a relatively simple matter to add these text labels into your image in the appropriate locations. To place the image on your page and then place the labels (as text) into the correct positions in such a way as to ensure that they stay there is almost impossible. If you do succeed then the information that needs to be stored so as to correctly position all of the text labels may take up much more storage space that if you had taken the simpler approach and inserted the labels into the image in the first place. The text may be more difficult to edit at a later date but chances are that you'll never need to do this. Image labels are much better handled as a part of the image rather than as separate text.

What about text as an image rather than as labels in an image? As you know, text exists with various fonts, font sizes, etc. and it is this information that needs to be stored with the text in order that the text can display properly if the text is stored as text. These attributes of the text can be readily changed in a text editor to change the size and appearance of the text. If the text is stored as a graphic then the appearance of the text is fixed based on the text attributes that you specified when you created the text image. You cannot later edit the text image to change the text attributes. This may sound like a disadvantage but there are a couple of advantages to this as well. One of these is text effects. You may not want just plain text (of whatever font, size, colour, etc.) you may want to add some special effects to your text. Depending on which editor that you use to edit your text you may be able to include some effects but you are then relying on that software to reproduce the effect later. If you incorporate the text effects into the text stored as an image then that image will always display those effects regardless of the software used to display it and many more text effects can be achieved (see for example the article on Textured Text - you can't achieve the effect shown at the bottom of that page without storing the text as an image).

Because text stored as a graphic does not rely on separately stored attributes, text stored as images is also more portable than it is as text. Provided that the image format is one that is readable on another computer (see Choosing a Graphics Format) then the text can be displayed on that computer and printed out with the exact same appearance as it had on the computer that created it (subject only to differences in screen and printer resolution). With text stored in a text file then the receiving computer must have the exact same fonts available as on the originating computer as well as compatible software that can read the text file in order for the text to appear the same on both computers (the only exception is when you Embed Fonts in a PDF). Even then, other considerations may result in the text having a different appearance.

Where the text is there as content (ie., to provide the reader with information) then the appearance of the text is of lesser importance and any differences between the appearance on one computer and another don't matter. There are times where the text is not only there as content but where the appearance of the text also has importance (eg., for text in a company logo). Where the appearance of the text is as important as what the text says then storing the text as an image is an appropriate alternative. When you do this you need to consider how much of the text that you will store as an image and how much can be stored as text as storing as an image takes up a lot more space. This is of particular importance where the document containing the text image is to be transmitted electronically where the transmission time will increase along with the file size.

There is one other time when you will need to handle text stored as an image and this is when you capture text into your computer using a scanner. Scanners do not understand the difference between text and any other images on your page. The entire page is an image as far as they are concerned. What you need to do after capturing the page as an image is to process that page with software that can recognise text in images. This type of software is called Optical Character Recognition (OCR) software. This type of software will examine the image of your page and identify and convert those sections of the page that it recognises as text into actual text. Depending on your OCR software the rest of the page may also be converted into a number of smaller images thus maintaining the appearance of your page. (Adobe Acrobat is one program that can do this for you). One thing that you will want to watch for once you have converted your page to text with your OCR software is that you use the Save As option in that software to save the text version of the page rather than the Save option as most programs speed up the save option by only saving the changes to the end of the existing file. Saving the text version as changes to the end of the file containing the image version makes for an even bigger file than the original image. Using Save As (even if you keep the file name the same) will discard the single image version of your page and only keep the much smaller text version.

There are times when storing text in or as images is appropriate and times when it is not. You need to consider the alternatives so as to produce documents efficiently in terms of how difficult that they are to create and edit, how much storage space that the resultant document will take up on your system, and how portable that the document is to other systems.


This article written by Stephen Chapman, Felgall Pty Ltd.

