Exercise 7-3
The cross-reference program from S7.3/126 could be improved: As it stands, if a word occurs more than once on the same input line, the porgram will report that line multiple times. Change the code so that it detects multiple occurences of the same line number and inserts the line number only once.
Background
Remember that when we run the Chapter 7 cross-reference program (i.e. Solution to Exercise 7-0 (Part 2 / 3) ), we have the following test results:
apple orange banana orange orange apple banana mango ^Z apple occurs on line(s): 1, 3 banana occurs on line(s): 1, 3 mango occurs on line(s): 4 orange occurs on line(s): 1, 2, 2
Note:
- Recall that the xref program assumes the first line as line 1 (instead of 0), second line as line 2 (instead of 1). This is as designed.
- The word “orange” appears on line 2 twice. So when it comes to being displayed in the summary, it shows the line number “2” twice.
- The objective of this exercise is to only store non-duplicated line number.
Solution Strategy
We only need to make a very small enhancement – to adjust the xref function by adding a small bit of code. Include a if condition to check whether the line number already exists for the corresponding associative array. Use the find utility of the <algorithm> directive to help us do this.
The Project
We can reuse the entire project as per Solution to Exercise 7-0 (Part 2 / 3). Simply replace the xref.cpp file with this new one:
xref.cpp
#include <iostream> // std::cin, std::istream #include <map> // std::map #include <string> // std::string #include <vector> // std::vector #include <algorithm> // std::find #include "split.h" // split using std::cin; using std::map; using std::string; using std::vector; using std::istream; using std::find; // Find all the lines that refer to each word in the input // In this program, the first line is represented by line 1, // second line is represented by line 2, etc. // (S7.3/126) // Adjusted as per Exercise 7-3: store only non-duplicated line numbers. map<string, vector<int> > xref(istream& in, vector<string> find_words(const string&) = split) { string line; int line_number = 0; map<string, vector<int> > ret; // map string to vector<int> // Read the next line while (getline(in, line)) { ++line_number; // Break the input line into words vector<string> words = find_words(line); // remember that each word occurs on the current line for (vector<string>::const_iterator it = words.begin(); it != words.end(); ++it) { // store only non-duplicated line numbers. if ( find( ret[*it].begin(), ret[*it].end(), line_number) == ret[*it].end() ) ret[*it].push_back(line_number); } } return ret; }
Test Program
Re-submitting the program with the same input, the program now produces non-duplicated line numbers per associative array (word).
apple orange banana orange orange apple banana mango ^Z apple occurs on line(s): 1, 3 banana occurs on line(s): 1, 3 mango occurs on line(s): 4 orange occurs on line(s): 1, 2
This solution works, but isn’t the most efficient. Since the lines are checked in a linear fashion, we know that the last line number stored in our vector for the current word is either a previous line or the current line. Because of this, we only need to check the last entry to make sure it’s not equal to the current line number. It is important that we check that there is an entry for the current word first because otherwise we are trying to access an element of a vector that may not exist. Here is my version:
for (vector::const_iterator it = words.begin(); it != words.end(); ++it) {
if (ret[*it].size() > 0 && *--ret[*it].end() == line_number) { /*Do nothing*/ }
else { ret[*it].push_back(line_number); }
}
Anonymous is correct. Also you can shorten all anon’s code to this:
if (ret[it].empty() || ret[it].back() != line_number)
ret[*it].push_back(line_number);
Using empty() is preferred to checking for zero size.
first line of that should be *it, not it. :)