The Difference Between Binary and Text Files

When you access a file from within C or C++ you have a choice between treating the file as a binary file or as a text file.

C uses the fopen(file,mode) statement to open a file and the mode identifies whether you are opening the file to read, write, or append and also whether the file is to be opened in binary or text mode.

C++ opens a file by linking it to a stream so you don't specify whether the file is to be opened in binary or text mode on the open statement. Instead the method that you use to read and/or write to the file determines which mode you are using. If you use the << operator to read from the file and the >> operator to write to the file then the file will be accessed in text mode. If instead you use the put() and get() or read() and write() functions then the file will be accessed in binary mode.

So what exactly is the difference between text and binary modes? Well the difference is that text files contain lines (or records) of text and each of these has an end-of-line marker automatically appended to the end of it whenever you indicate that you have reached the end of a line. There is an end of line at the end of the text written with the C fwrite() function or in C++ when you <<endl. Binary files are not broken up into separate lines or records so the end-of line marker is not written when writing to a binary file.

Reading from a text file or binary file is different too as a text file is automatically broken up into separate records as it read in based on the location of the end-of-line markers.

So what is this end-of-line marker? Well that depends on the operating system that you are using. The Apple Macintosh computers use a single carriage return as the end-of-line marker (x'0D') while Unix based operating systems including Linux use a single line-feed character (x'0A'). Most PC based systems including DOS, all versions of windows, and OS/2 use a carriage return/line feed combination (x'0D0A') as the end-of-line marker. C and C++ terminate strings with a low value character (x'00').

So what happens when we read from a text file is that the end-of-line character for the operating system that we are using gets converted into a low value end-of-string indicator and when we write to a file the appropriate end-of-line character(s) get written when we indicate the end of the line. This makes the reading and writing of text files much easier because the appropriate end-of-line markers are handled for us.

With a binary file none of these conversions take place. When we read a binary file the end-of-line characters for our operating system will be read into the string and treated no different than any other character. When we write to a binary file the only end-of-line markers that are written will be those that we code into the output ourselves and hence will be exactly as we code it regardless of the operating system that we are running on. This makes it much easier for us when the file does not contain straight text and the end-of-line marker does not separate lines of text but rather appears as part of the non-text data content of the file.

A binary file can contain text but the text that it contains is not considered to be broken up into a number of lines by the occurrence of end-of-line markers. A binary file may alternatively contain information that contains no text whatsoever. It is up to the program reading the file to make sense of the data contained in a binary file and convert it into something meaningful (eg. an image or a series of fixed length records).


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow