Exercise 5-2
Write the complete new version of the student-grading program, which extracts records for failing students, using vectors. Write another that uses lists. Measure the performance difference on input files of 10 lines, 1,000 lines, and 10,000 lines.
Solution
The objective of this exercise is to compute the overall job run duration for the function vector<Student_info> extract_fails, and another function list<Student_info> extract_fails. These two functions are essentially the same, except one uses vector and the other uses list.
Fortunately the codes in Chapter 5 of the book (version 3 for vector and version 4 for list), or my Solution to Exercise 5-0 (Part 1/3) has already provided the skeleton structure for this. All I need to do is to adjust the program slightly to incorporate the followings:
- Have a pair of start and end time-stamps just before and after the extract_fails function.
- Compute the duration by taking the time difference between the end and start time.
- Display the duration.
Once we have these new features in our revised program, we are then able to re-run the program using varying number of input (lines).
As I did not recall there were any mentions about time-stamping in the book, I did a Google search and found this very clever clock_t technique (by Dolph) on the Stack Overflow Forum, in conjunction with the clock_t documentation on the cplusplus.com website.
i.e. the <ctime> directive, clock_t , clock(), and CLOCKS_PER_SEC.
This is how to use it.
clock_t startTime = clock(); doWhateverFunctionHere(); clock_t endTime = clock(); clock_t clockTicksTaken = endTime - startTime; double timeInSeconds = clockTicksTaken / (double) CLOCKS_PER_SEC;
The strategy is therefore:
- Make a copy of the entire project as per Solution to Exercise 5-0 (Part 1/3).
- Create a new extract_fails.cpp that contains two functions: Function test_extract_fails_v3 computes and displays the duration for the vector<Student_info> case. Function test_extract_fails_v4 is similar but for the list<Student_info> case. Wrap these under a header file extract_fails.h.
- Simplify the main program to enable easy invocation of each function.
The Project
For clarity sake I include the high level project tree here. Note that majority of the files are from Solution to Exercise 5-0 (Part 1/3) – any files that are not mentioned below are mentioned in that post. In this post I shall only include the new / revised files.
- main.cpp – revised for easy invocation of the functions test_extract_fails_v3 and Function test_extract_fails_v4.
- test_extract_fails.cpp – contains the functions test_extract_fails_v3 and test_extract_fails_v4.
- test_extract_fails.h – declare the functions in test_extract_fails.cpp.
main.cpp
#include "test_extract_fails.h" int main() { // invoke either one of the following as required // test_extract_fails_v3(); test_extract_fails_v4(); return 0; }
test_extract_fails.cpp
#include <iostream> #include <string> #include <vector> #include <list> #include <iomanip> #include <ios> #include "grade.h" #include "Student_info.h" #include "extract_fails.h" #include <ctime> using std::string; using std::vector; using std::list; using std::cin; using std::cout; using std::endl; using std::setprecision; using std::streamsize; // There are two functions here. // - The top one is used to test out extract_fails version 3 (vector<Student_info>). // - The bottom one is for testing out extract_fails version 4 (list<Student_info>). // for testing extract_fails_v3 - that uses vector<Student_info> int test_extract_fails_v3() { vector<Student_info> students; Student_info record; // read and store all the student's data. while (read(cin, record)) students.push_back(record); // Extract the failed students (use extract_fails_v3) and time-stamp it! clock_t startTime = clock(); vector<Student_info> students_failed = extract_fails_v3(students); clock_t endTime = clock(); clock_t clockTicksTaken = endTime - startTime; double timeInSeconds = clockTicksTaken / (double) CLOCKS_PER_SEC; streamsize prec = cout.precision(); cout << "Elapsed (in seconds): " << setprecision(30) << timeInSeconds << setprecision(prec) << endl; return 0; } // for testing extract_fails_v4 - that uses list<Student_info> int test_extract_fails_v4() { list<Student_info> students; Student_info record; // read and store all the student's data. while (read(cin, record)) students.push_back(record); // Extract the failed students and time-stamp it! clock_t startTime = clock(); list<Student_info> students_failed = extract_fails_v4(students); clock_t endTime = clock(); clock_t clockTicksTaken = endTime - startTime; double timeInSeconds = clockTicksTaken / (double) CLOCKS_PER_SEC; streamsize prec = cout.precision(); cout << "Elapsed (in seconds): " << setprecision(30) << timeInSeconds << setprecision(prec) << endl; return 0; }
test_extract_fails.h
#ifndef GUARD_TEST_EXTRACT_FAILS_H #define GUARD_TEST_EXTRACT_FAILS_H int test_extract_fails_v3(); int test_extract_fails_v4(); #endif // GUARD_TEST_EXTRACT_FAILS_H
Test Program
I’ve run 3 trials for every individual tests and use these to compute the average job run duration for the extract_fail_v3 (vector) and extract_fail_v4 (list).
This is my sample set of 10 input lines.
Kate 88.88 88.88 88.88 88.88 88.88 John 55.55 55.55 55.55 55.55 55.55 Pat 66.66 66.66 66.66 66.66 66.66 Joe 100.00 100.00 100.00 100.00 100.00 Mary 11.11 11.11 11.11 11.11 11.11 Bill 33.33 33.33 33.33 33.33 33.33 Jay 22.22 22.22 22.22 22.22 22.22 Bob 4.00 4.00 4.00 4.00 4.00 Fred 77.77 77.77 77.77 77.77 77.77 Louis 44.44 44.44 44.44 44.44 44.44
To test out the case of 100, 1000, and 10000 lines, I simply “copy and paste” these 10 lines to fill the required number of lines. (This seems logical as our objective for this exercise is to compare performance of vector and list)
Test Result
File size | Extract_fails_v3 (vector<Student_info>) | Extract_fails_v4 (list<Student_info>) |
10 lines | 0.000 seconds | 0.000 seconds |
100 lines | 0.003 seconds | 0.001 seconds |
1,000 lines | 0.165 seconds | 0.006 seconds |
10,000 lines | 9.300 seconds | 0.045 seconds |
From this we have just verified the hypothesis in S5.2.2/87 of the book – in which the Author suggested the exponential increase of job-run time when the input volume go beyond the magnitude of 10,000 lines.
Instead of writing a few lines of random input, I created a random data generator specifically for generating random data for the program in this exercise. I wasted so much time but it was worth it in the end.
Good Job