Exercise 3-3
Write a program to count how many times each distinct word appears in its output.
Solution
Note: Viktor has kindly pointed out the fact that I have mis-read the question as “How many distinct words are there” (instead of how many times each distinct word appears). The solution below therefore, does not actually under the question above!!! (If I have time I may redo the question – otherwise please just ignore my solution below!!!)
My solution is a result of combining the knowledge gained from Solution to Exercise 2-8 (on recursive operations between element v[i] and v[i+1]) and Solution to Exercise 3-2 (on manipulating vector).
The solution strategy:
- Ask user to provide the list of words so we can append the string elements to a vector, v.
- Compute the size of vector, N.
- If vector size is 0, the number of distinct words is also 0.
- If vector size is 1, the number of distinct words is also 1.
- If vector size is 2 or above, we apply an algorithm to identify and count the distinct words. (To be explained further below).
The algorithm (for the case of vector size of 2 or above) is described as followings:
- We sort the vector v in non-descending order.
- We initialise the index A to 0, and index B to 1. These indices will be used for comparing adjacent elements within the vector (to check for distinctive elements).
- We also initialise the distinct element count figure ND to 1 – we know that we have at least 1 distinct word.
- We perform N-1 number of comparisons between the adjacent elements v[A] and v[B]. (e.g. if there are 10 elements, we can only compare 9 sets of adjacent elements).
- During each comparison step, if v[B] is not the same as v[A], it implies that v[B] is a newly discovered distinct word. We increment ND.
- Note: if v[B] is the same as v[A], it implies v[B] is a repeated word. We do nothing in that case, as we are only interested in discovering new distinct words.
- We then increment the indices A and B by 1 for the next comparisons. i.e. we first compare v[0] and v[1], then v[1] and v[2], then v[2] and v[3], etc…
- We display the final value of ND – this is the number of distinct words computed.
Putting this all together, we have our full program.
#include <algorithm> #include <iostream> #include <string> #include <vector> using std::cin; // <iostream> using std::cout; // <iostream> using std::endl; // <iostream> using std::sort; // <algorithm> using std::string; // <string> using std::vector; // <string> int main() { // display header message cout << "***************************************************************\n" "*** This program computes number of unique words ***\n" "***************************************************************\n"; cout << endl; // ask for a list of numbers and store the list as a vector cout << "Enter a list of words one by one: "; vector<string> v; string x; while (cin >> x) v.push_back(x); // append new input to the vector cout << endl; // define and compute core vector variables typedef vector<string>::size_type vecSize; // define a type for vector size related variables vecSize N = v.size(); // number of elements in the vector vecSize numLoops = N - 1; // number of (comparison) operators required vecSize ND; // number of distinct words // Check vector size, action accordingly if (N ==0 ) { ND = 0; cout << "Number of distinct words = " << ND << endl; return 0; } else if (N ==1 ) { ND = 1; cout << "Number of distinct words = " << ND << endl; return 0; } else { // sort the vector; sort(v.begin(),v.end()); // display some results to console window cout << "Vector size (number of words entered): " << N << endl; cout << endl; cout << "Display the sorted (non-descending) distinct words below." << endl; cout << v[0] << endl; // declare new variables vecSize A = 0; // vector index vecSize B = 1; // vector index ND = 1; // number of distinct words // Loop through the vector, compute ND, and identify the distinct words for (vecSize i = 0; i != numLoops; ++i) { if (v[B] != v[A]) { ++ND; cout << v[B] << endl; // display any newly discovered distinct words } ++A; ++B; } // Display final distinct word count cout << endl; cout << "Number of distinct elements (words): " << ND << endl; } return 0; }
Result
Below shows the results of the 3 sets of test:
- N = 0
- N=1
- N>=2
The results also reveal that fact that the sorting elements of a vector is case-sensitive. e.g. the string John (beginning with upper case) is different to john (all lower case).
N = 0
*************************************************************** *** This program computes number of unique words *** *************************************************************** Enter a list of words one by one: ^Z Number of distinct words = 0
N = 1
*************************************************************** *** This program computes number of unique words *** *************************************************************** Enter a list of words one by one: Johnny ^Z Number of distinct words = 1
N >= 2
*** This program computes number of unique words *** *************************************************************** Enter a list of words one by one: john peter pete johnny john John ^Z Vector size (number of words entered): 6 Display the sorted (non-descending) distinct words below. John john johnny pete peter Number of distinct elements (words): 5