Accelerated C++ Solution to Exercise 7-4

Exercise 7-4

The output produced by the cross-reference program will be ungainly if the input file is large. Rewrite the program to break up the output if the lines get too long.

Background

The original cross-reference program as per Solution to Exercise 7-0 (Part 2 / 3) lacks control in terms of output line length. For example, if my input “article” contains tons of repeated words, such as this:

apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple
orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange
orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange orange
orange orange orange orange orange orange orange
banana banana banana
banana
banana banana banana
banana banana
banana banana banana banana banana banana banana
banana banana
apple apple apple apple apple apple apple apple apple apple apple apple apple apple
apple apple apple apple apple apple apple apple apple apple apple apple apple apple
apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple apple

Given the above input, the default program produces output summary like this:

apple occurs on line(s): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 1
3, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13
banana occurs on line(s): 5, 5, 5, 6, 7, 7, 7, 8, 8, 9, 9, 9, 9, 9, 9, 9, 10, 10
orange occurs on line(s): 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
 3, 4, 4, 4, 4, 4, 4, 4

The output lines may become very long – i.e. user may need to scroll left and right to see all the line numbers (not very user friendly). A bit “ungainly” as suggested by the author.

This therefore motivates us to add a control in place that ensures the output line length never exceeds a pre-defined value. i.e. if we set max line length to be 30 characters, and the output result contains 100 characters, the program should be able to split into 4 lines (first 3 lines contain 30 characters, and the 4th line contains 10 characters). This way, the user won’t need to scroll left and right to see all line numbers – more user friendly.

The following section will describe a solution to this problem.

Solution

This is our solution strategy:

  1. Make a copy of the project (i.e. source and header files) as per Solution to Exercise 7-0 (Part 2 / 3).
  2. Update the main program to incorporate the desirable line-length control.

Incorporate Line-length Control

First of all, we need define a const string::size_type lineLength, which is the maximum line-length of the output.

We then need to have a means to concatenate various output values.

In the Solution to Exercise 5-1, we learnt that the variable type std::ostringstream (of <sstream>  directive) is way more superior in concatenation in comparison to std::string (of <string> directive). i.e. there seem to be much less constraints in terms of concatenating values of different types for a std::ostringstream than std::string. For this reason, we shall use std::ostringstream to store output. Note also that ostreamstring can be converted into a string easily.

When displaying the line numbers for each word, we implement a if-condition to ensure the line length doesn’t exceed our pre-defined lineLength. The modulus operator (%) will come in handy for this.

The Project

Just simply re-use the entire project from Solution to Exercise 7-0 (Part 2 / 3), and replace the main program with the following. (Note that for illustration sake the pre-defined lineLength is set to 60 characters max per line).


#include <iostream>   // std::cin, std::cout, std::endl
#include <map>        // std::map
#include <string>     // std::string
#include <vector>     // std::vector
#include <sstream>    // std::ostringstream
#include "xref.h"     // xref

using std::cin;
using std::cout;
using std::endl;
using std::map;
using std::string;
using std::vector;
using std::ostringstream;

// Find all the lines that refer to each word in the input
// (S7.3/128)
int main()
{
  // Call xref using split by default.
  map<string, vector<int> > ret = xref(cin);

  // Set the width of output line to this max width limit.
  const string::size_type lineLength = 60;

  // Write the results.
  for (map<string, vector<int> >::const_iterator it = ret.begin();
       it != ret.end(); ++it) {

    // We use ostreamstring for its powerful concatenation properties.
    // We can pretty much concatenate anything to an ostreamstring, in
    // comparison to string.
    ostringstream outputStream;

    // Write the word
    outputStream << it->first << " occurs on line(s): ";

    // Followed by one or more line numbers.
    vector<int>::const_iterator line_it = it->second.begin();
    outputStream << *line_it;
    ++line_it;

    // Write the rest of the line numbers, if any.
    while (line_it != it->second.end()) {
      outputStream << ", " << *line_it;
      ++line_it;
    }

    // Break outputStream into multiple lines with max width of lineLength.
    string outputLine = outputStream.str();
    for (string::size_type i = 0; i != outputLine.size(); ++i ) {
      if (i % lineLength == 0) {
        cout << endl;
      }
      cout << outputLine[i];
    }

    // Write a new line to separate each word from the next.
    cout << endl;
  }

  return 0;
}

Test Program

Re-submitting the same input (see the top of this post), our output now looks much more “controlled”, in terms of line length.


apple occurs on line(s): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1
1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 1
2, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 1
3, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 1
3, 13, 13, 13, 13, 13, 13, 13, 13, 13

banana occurs on line(s): 5, 5, 5, 6, 7, 7, 7, 8, 8, 9, 9, 9
, 9, 9, 9, 9, 10, 10

orange occurs on line(s): 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4
, 4

Not perfect, but better.

Reference

Koenig, Andrew & Moo, Barbara E., Accelerated C++, Addison-Wesley, 2000