# Exercise 3-3

Write a program to count how many times each distinct word appears in its output.

# Solution

**Note**: Viktor has kindly pointed out the fact that I have mis-read the question as “How many distinct words are there” (instead of how many times each distinct word appears). The solution below therefore, does not actually under the question above!!! (If I have time I may redo the question – otherwise please just ignore my solution below!!!)

My solution is a result of combining the knowledge gained from Solution to Exercise 2-8 (on recursive operations between element v[i] and v[i+1]) and Solution to Exercise 3-2 (on manipulating vector).

The solution strategy:

- Ask user to provide the list of words so we can append the string elements to a vector, v.
- Compute the size of vector, N.
- If vector size is 0, the number of distinct words is also 0.
- If vector size is 1, the number of distinct words is also 1.
- If vector size is 2 or above, we apply an algorithm to identify and count the distinct words. (To be explained further below).

The algorithm (for the case of vector size of 2 or above) is described as followings:

- We sort the vector
**v**in non-descending order. - We initialise the index
**A**to 0, and index**B**to 1. These indices will be used for comparing adjacent elements within the vector (to check for distinctive elements). - We also initialise the distinct element count figure
**ND**to 1 – we know that we have at least 1 distinct word. - We perform
**N-1**number of comparisons between the adjacent elements**v[A]**and**v[B]**. (e.g. if there are 10 elements, we can only compare 9 sets of adjacent elements). - During each comparison step, if
**v[B]**is not the same as**v[A]**, it implies that**v[B]**is a newly discovered distinct word. We increment**ND**. - Note: if
**v[B]**is the same as**v[A]**, it implies**v[B]**is a repeated word. We do nothing in that case, as we are only interested in discovering new distinct words. - We then increment the indices
**A**and**B**by 1 for the next comparisons. i.e. we first compare v[0] and v[1], then v[1] and v[2], then v[2] and v[3], etc… - We display the final value of
**ND**– this is the number of distinct words computed.

Putting this all together, we have our full program.

#include <algorithm> #include <iostream> #include <string> #include <vector> using std::cin; // <iostream> using std::cout; // <iostream> using std::endl; // <iostream> using std::sort; // <algorithm> using std::string; // <string> using std::vector; // <string> int main() { // display header message cout << "***************************************************************\n" "*** This program computes number of unique words ***\n" "***************************************************************\n"; cout << endl; // ask for a list of numbers and store the list as a vector cout << "Enter a list of words one by one: "; vector<string> v; string x; while (cin >> x) v.push_back(x); // append new input to the vector cout << endl; // define and compute core vector variables typedef vector<string>::size_type vecSize; // define a type for vector size related variables vecSize N = v.size(); // number of elements in the vector vecSize numLoops = N - 1; // number of (comparison) operators required vecSize ND; // number of distinct words // Check vector size, action accordingly if (N ==0 ) { ND = 0; cout << "Number of distinct words = " << ND << endl; return 0; } else if (N ==1 ) { ND = 1; cout << "Number of distinct words = " << ND << endl; return 0; } else { // sort the vector; sort(v.begin(),v.end()); // display some results to console window cout << "Vector size (number of words entered): " << N << endl; cout << endl; cout << "Display the sorted (non-descending) distinct words below." << endl; cout << v[0] << endl; // declare new variables vecSize A = 0; // vector index vecSize B = 1; // vector index ND = 1; // number of distinct words // Loop through the vector, compute ND, and identify the distinct words for (vecSize i = 0; i != numLoops; ++i) { if (v[B] != v[A]) { ++ND; cout << v[B] << endl; // display any newly discovered distinct words } ++A; ++B; } // Display final distinct word count cout << endl; cout << "Number of distinct elements (words): " << ND << endl; } return 0; }

# Result

Below shows the results of the 3 sets of test:

- N = 0
- N=1
- N>=2

The results also reveal that fact that the sorting elements of a vector is case-sensitive. e.g. the string John (beginning with upper case) is different to john (all lower case).

## N = 0

*************************************************************** *** This program computes number of unique words *** *************************************************************** Enter a list of words one by one: ^Z Number of distinct words = 0

## N = 1

*************************************************************** *** This program computes number of unique words *** *************************************************************** Enter a list of words one by one: Johnny ^Z Number of distinct words = 1

## N >= 2

*** This program computes number of unique words *** *************************************************************** Enter a list of words one by one: john peter pete johnny john John ^Z Vector size (number of words entered): 6 Display the sorted (non-descending) distinct words below. John john johnny pete peter Number of distinct elements (words): 5

Hi, I think you misunderstood the problem. The description says to write a program that counts how many times each distinct word appears, not how many distinct words there are.

Ooops! You are so right!!! (I definitely had mis-read the question)

Hi! First of all, thank you for taking the time to do what you did for all the chapters. It helps immensely.

Moreover, I have a question regarding one part of your solution.

You decide to create variables A and B and give them the values 0 and 1 (respectively).

Is there any reason as to why you didn’t use the same ‘i’ of the for loop and simply say [i] and [i+1] ? (That way they also increment)

Like this:

for (vecSize i = 0; i != numLoops; ++i)

{

if (v[i+1] != v[i])

.

.

.

Also, I believe that your counter and the int in the last for loop should be ints and not vecSize. I think vecSize works because vector::size_type is a typedef of an int or longint, but I am almost sure you should not use it in this case.

Finally, and this is just what I believe is more of a common practice, ‘<‘ is more commonly used than’ !=’ , leaving the for loop like this.

int count = 1;

`for (int i=0; i < size-1; i++) {`

if (words[i+1] != words[i]) {

++count;

}

}

In conclusion, that is the beauty of programming, that a simple question can have so many different solutions which can all reach the correct output.

But again, thank you for your input and the good work you’ve done here!

here is my solution:

`if (size == 0)`

{

cout << "There is no words in the vector. Size = " << size <<

"\nPlease try again!" << endl;

}

else if (size == 1)

{

cout << "Number of distinct words = " << size <<

"\nPlease try again!" << endl;

}

else

{

sort(words.begin(), words.end());

`int a = 0; // index of the first word in the vector`

int b = 1; // index of the second word in the vector

int x = 1;

for (int i = 0; i < size; i++)

{

if (words[a] == words[b])

{

x++;

}

else

{

cout << words[a] << " - " << x << endl;

x = 1;

}

a++;

b++;

}

}