File I/O

Most sufficiently complex applications need to persist data. While databases are often used to persist complex hierarchical data, less complex data can be written to a text or binary file. In this module we’ll explore the use of file streams for reading and writing files.

Table of Contents

  1. Objectives
  2. Reading and Writing Files
  3. File Streams
  4. Opening File Streams
  5. Check If Files Open Successfully
  6. Closing Files
  7. Writing To A Text File
  8. Reading From a Text File
  9. Reading a Line of Text at a Time
  10. Reading Basic Delimited Text Data
  11. Simpler Delimited Text Data with Spaces
  12. Reading Structured Text Data
  13. Binary File Access
  14. Streaming Binary Structures
  15. Vector Iterator Trick with Binary Data
  16. Random File Access
  17. Random Access Example
  18. Struct Padding and Binary Files
  19. Filesystem Library
  20. Further Reading

Objectives

Upon completion of this module, you should be able to:

  • Open files for reading and writing using file streams.
  • Create file streams for text and binary data.
  • Write data to a text file using basic delimiter strategies.
  • Use overloaded stream operators to read and write data to a text file.
  • Read and write binary data sequentially and through random access.

Reading and Writing Files

Questions you’ll want to ask when reading and writing data to a file:

  • Should the data be stored in a human-readable text format, or a binary format that will be much harder for a human to verify? (This decision may come down to speed or file-size requirements.)
  • If a human-readable text format is desired, are you going to create your own custom format or use an established text format like CSV or JSON? (Except for the simplest data, an established format is preferred.)
  • If you’re storing binary data in text form, what encoding format will you use? (Ex. Base64)
  • If you’re storing binary data, do you need to read data written by machines with different binary endianness?

In the examples below we’ll explore simple text and binary storage without using established formats or worrying about endianness.

File Streams

File I/O is based on the stream-base I/O we learned about in the console I/O section.

Let’s start by reviewing the stream hierarchy:

Stream Class Hierarchy

🎵 Note:

The fstream header is required to use the ifstream, fstream, and ofstream classes.

Opening File Streams

Unlike cout and cin which are available for streaming by default, we need to open files before we can stream to/from them.

#include <fstream>

int main() {
  std::fstream  inOutFile{"in-out-filename.txt"};  // Default Mode: ios_base::in | ios_base::out
  std::ofstream outputFile{"output-filename.txt"}; // Default Mode: ios_base::out | ios_base::trunc
  std::ofstream appendFile{"append-filename.txt", std::ios_base::app}; // Append output, don't truncate file.
  std::ifstream inputFile{"input-filename.txt"};   // Default Mode: ios_base::in
}

Check If Files Open Successfully

File streams can be tested as booleans to see if the file opened successfully:

#include <fstream>
#include <iostream>

int main() {
  std::ofstream outputFile{"output-filename.txt"};
  if (!outputFile) {
    std::cerr << "Could not open output file.\n";
  }

  std::ifstream inputFile{"input-filename.txt"};
  if (!inputFile) {
    std::cerr << "Could not open input file.\n";
  }
}

Closing Files

Files opened for input or output are automatically closed when their associated variable goes out of scope. They can also be manually closed earlier than this using the .close() method.

#include <fstream>

int main() {
  std::ofstream outputFile{"output-filename.txt"};
  // Nah, changed my mind:
  outputFile.close();
}

Writing To A Text File

To write to a file we use the insertion operator: <<

Reading From a Text File

To read from a file we can use the extraction operator >> along with the fact that the stream itself will return 0 (which evaluates to false in C++) if we’ve reached the end of the file (EOF).

🎵 Note:

The >> uses the space and end-of-line characters as delimiters.

Reading a Line of Text at a Time

We can use std::getline() to read one line at a time from an input stream. The newline character is not included in the read data.

Reading Basic Delimited Text Data

When storing numbers to a file we need some way to keep numbers from getting combined. You don’t want to write a 40 beside a 22 and have that read back in as 4022.

Imagine a simplified version of the CSV format that is simply comma-delimited integers:

🎵 Note:

Using a C++17 “if initializer” on line 13.

Simpler Delimited Text Data with Spaces

Reading delimited data is simplified with a space character delimiter, as the input stream will automatically consume the spaces.

Reading Structured Text Data

Recall in our section on operator overloading we created a Money class with custom i/o stream operators.

With an overloaded << operator for input streams we can easily read well-formatted files into vectors:

// Open input stream:
std::ifstream inputFile{"input-file.txt"};
// Read all Money entries from the file into a vector:
std::vector<Money> bank{ // Constructor the vector using an iterator:
  std::istream_iterator<Money>{inputFile},
  {} // Short form for the end of the iterator.
};
// Loop through the received Money entries:
for(Money money : bank) {
  std::cout << money << "\n";
}

Here’s a version of the Money class used to read in a file of dollar amounts, one per line, in the $m.n format (where m and n are integers):

🎵 Note:

For simplicity sake we’re halting all file parsing if we encounter a badly formatted entry.

Binary File Access

So far we’ve been reading and writing our data in text format. We can conserve space and potentially save time by writing in a binary format.

Let’s start by writing strings to a binary file. We will also need to store the length of each stored string so that they can be read back in separately.

🎵 Note:

If we were actually just saving strings to a file we may as well be using a text file.

⏳ Wait For It:

Note the use of pointers, which we’ll cover in more detail in future sections.

Streaming Binary Structures

We can also write POD (Plain Old Data) structs to a binary file, as in structs that only contain primitives or other POD structs. This process is often called serialization or marshalling.

🎵 Note:

For more on PODs see this StackOverflow response.

Vector Iterator Trick with Binary Data

Earlier we used an iterator to quickly fill a vector with input stream text data (parsed by an overloaded <<). We can use a similar trick to read in all a file full of binary data, if all we want are the raw bytes.

std::ifstream inputFile("input.bin", std::ios::binary);
std::vector<std::byte> data{
  std::istreambuf_iterator<std::byte>(inputFile), // istreambuf iterator for binary data.
  {}
};

Random File Access

So far with both our text and our binary files we’ve been writing and reading our data sequentially from the start of a file until the end. We can also perform what is called “random file access”, not as in accessing random locations, but as in the ability to read/write to/from arbitrary locations within a file.

  • With input streams we can move the internal read pointer within the file using: if.seekg()
  • With output streams we can move the internal write pointer within the file using: of.seekp()

Movement performed with these functions is done relative to:

  • The beginning of the file: std::ios::beg
  • The current location within the file: std::ios::cur
  • The end of the file: std::ios::end

Assuming an ifstream named if:

if.seekg(0, std::ios::end); // move to end of file
if.seekg(0, std::ios::beg); // move to beginning of file
if.seekg(50, std::ios::beg); // Move to the 50th byte in the file.
if.seekg(50, std::ios::cur); // Move forward 50 bytes from the current position to the 100th byte.

🎵 Note:

The g in seekg stands for “get” and the p in seekp stands for “put”.

Random Access Example

Struct Padding and Binary Files

Let’s say that we wanted to save some file space when writing our PlainOldMoney structs to a binary file. We might try to decrease the size of the struct by changing the datatype of the cents member from an int (4 bytes) to a short (2 bytes).

struct PlainOldMoney {
  int dollars; // int: 4 bytes
  short cents; // short: 2 bytes
}

Naively we might assume the size of our struct has gone down from 8 bytes down to 6 bytes, but if we open up the associated binary file in a hex editor we’ll notice something odd: Each structure is taking up more memory than we might expect, they are still 8 bytes rather than 6!

The TL;DR is that C and C++ compilers follow certain rules which add extra padding into structs. This is because modern CPUs read and write memory most efficiently when the data is “naturally aligned”.

From a Stack Overflow thread:

“On 64 bit systems, int should start at addresses divisible by 4, and long by 8, short by 2. For struct, other than the alignment need for each individual member, the size of whole struct itself will be aligned to a size divisible by size of largest individual member, by padding at end.”

This is why our 6 byte struct was padded to 8 bytes. The largest struct member of PlainOldMoney is an int, so the overall struct size is padded such that the size becomes divisible by 4.

Read the following for more details:

Filesystem Library

Navigating the file system in an OS-agnostic manor is often required to:

  • List files.
  • Build file paths.
  • Test for the existence of directory and files.
  • Create directories and files.
  • Copy, move, and delete directories and files.

As of C++17 we’ve had <filesystem> as part of the standard library. openFrameworks doesn’t yet support C++17 projects, so it includes a number of helper classes to work with the file system.

Further Reading