Exercise 5-6
Rewrite the extract_fails function from S5.1.1/77 so that instead of erasing each failing student from the input vector students, it copies the records for the passing students to the beginning of students, and then uses the resize function to remove the extra elements from the end of students. How does the performance of this version compare with the one in S5.1.1/77?
Solution
The original (version 2) extract_fail function in the text book S5.1.1/77 looks like this – it uses the vector<XXX>::erase function.
{ Student_group fail; Student_group::size_type i = 0; // invariant: elements [0,i) of students represent passing grades while (i != students.size()) { if (fgrade(students[i])) { fail.push_back(students[i]); students.erase(students.begin() + i); } else ++ i; } return fail; }
Note that I have used the technique developed from my Solution to Exercise 5-4, in which all the vector<Student_info> are replaced by Student_group, using the C++ typedef. (Just trust me for now there is a typdef somewhere else that defines Student_group) I believe it makes sense to build on what we have learnt!
The newly “re-written” function will look like this it uses a combination of vector<XXX>::insert and vector<XXX>::resize functions.
Student_group extract_fails_v2_5p6(Student_group& students) { Student_group fail; typedef Student_group::size_type VecSize; VecSize countPass = 0; // invariant: elements [0,i) of students represent passing grades VecSize i = 0; while (i != students.size()) { if (fgrade(students[i])) fail.push_back(students[i]); else { students.insert(students.begin(), students[i++]); ++countPass; } ++i; } students.resize(countPass); return fail; }
If after skimping through this new code and you fully understand the concepts / what’s going on, then congratulation you are done!
From a series of tests the new version happens to take 50% more time than the original version (when reaching around 10,000 lines of input) – performance results can be found at the bottom of the page under the Test Result section.
If you however would like to know more in details, such as how I derive this new code logic, then read on!
Code Logic – The insert/resize Method
The way I derive the eventual code was simply doing some sketches on paper (In face this is how I usually solve problems – making sketches and pictures!). Below shows the diagrams that I drawn – essentially I started with a hypothetical example and just work my way through using logic. Some notes:
- The countPass is a counter to keep track of the number of passing students – this will be used later on to feed the vector<XXX>::resize function.
- The i, is just an incrementing variable (our good old friend), to facilitate a while loop.
Have we reached the end of the vector? (yes)
Use the resize function to keep only the passing students in vector<Student_info> students.
And these are the final output!
An Important Learning – iterator
This exercise has exposed an interesting fact: iterator does NOT work when the underlying vector changes size! i.e. the following code will not work (even though it looks correct). One must use vector indexing instead! (I spent an hour debugging this – a hard lesson learnt!). Must be a feature of vector iterator vs vector index.
Student_group extract_fails_v2_5p6(Student_group& students) { Student_group fail; Student_group::iterator i = students.begin(); Student_group::size_type countPass = 0; // invariant: elements [0,i) of students represent passing grades while (i != students.end()) { if (fgrade(*i)) fail.push_back(*i); else { students.insert(students.begin(), *i); ++countPass; ++i; } ++i; } students.resize(countPass); return fail; }
The Project
Here are the complete set of header and source files that enable me to perform the test.
Source File List
Header File List
Source Files
main.cpp
#include <iostream> #include <iomanip> #include <ios> #include <ctime> #include "Student_info.h" #include "Student_group.h" // switch between vector and list based Student_group here #include "grade.h" #include "extract_fails.h" using std::cin; using std::cout; using std::endl; using std::setprecision; using std::streamsize; int main() { Student_group students; Student_info record; // read and store all the student's data. while (read(cin, record)) students.push_back(record); // Extract the failed students and time-stamp it! clock_t startTime = clock(); Student_group students_failed = extract_fails_v2(students); /* Student_group students_failed = extract_fails_v2_5p6(students); */ clock_t endTime = clock(); clock_t clockTicksTaken = endTime - startTime; double timeInSeconds = clockTicksTaken / (double) CLOCKS_PER_SEC; streamsize prec = cout.precision(); cout << "Elapsed (in seconds): " << setprecision(30) << timeInSeconds << setprecision(prec) << endl; // Uncomment the following block to display passing and failing students /* cout << endl; // Report passing students cout << "These students have passed." << endl; for (Student_group::const_iterator iter = students.begin(); iter != students.end(); ++iter) cout << iter->name << " (" << grade(*iter) << ")" << endl; cout << endl; // Report failing students cout << "These students have failed." << endl; for (Student_group::const_iterator iter = students_failed.begin(); iter != students_failed.end(); ++iter) cout << iter->name << " (" << grade(*iter) << ")" << endl; */ return 0; }
extract_fails.cpp
#include "Student_info.h" #include "Student_group.h" #include "grade.h" // separate passing and failing student records // derived from S5.1.1/77 Student_group extract_fails_v2(Student_group& students) { Student_group fail; Student_group::size_type i = 0; // invariant: elements [0,i) of students represent passing grades while (i != students.size()) { if (fgrade(students[i])) { fail.push_back(students[i]); students.erase(students.begin() + i); } else ++ i; } return fail; } // Required by Exercise 5-6 Student_group extract_fails_v2_5p6(Student_group& students) { Student_group fail; typedef Student_group::size_type VecSize; VecSize countPass = 0; // invariant: elements [0,i) of students represent passing grades VecSize i = 0; while (i != students.size()) { if (fgrade(students[i])) fail.push_back(students[i]); else { students.insert(students.begin(), students[i++]); ++countPass; } ++i; } students.resize(countPass); return fail; }
grade.cpp
#include <stdexcept> #include <vector> #include "grade.h" #include "median.h" #include "Student_info.h" using std::domain_error; using std::vector; // definitions for the grade functions from S4.1/52, S4.1.2/54, S4.2.2/63 // compute a student's overall grade from midterm and final exam // grades and homework grade (S4.1/52) double grade(double midterm, double final, double homework) { return 0.2 * midterm + 0.4 * final + 0.4 * homework; } // compute a student's overall grade from midterm and final exam grades // and vector of homework grades. // this function does not copy its argument, because median (function) does it for us. // (S4.1.2/54) double grade(double midterm, double final, const vector<double>& hw) { if (hw.size() == 0) throw domain_error("student has done no homework"); return grade(midterm, final, median(hw)); } // this function computes the final grade for a Student_info object // (S4.2.2/63) double grade(const Student_info& s) { return grade(s.midterm, s.final, s.homework); } // predicate to determine whether a student failed // (S5.1/75) bool fgrade(const Student_info& s) { return grade(s) < 60; }
median.cpp
// source file for the median function #include <algorithm> #include <stdexcept> #include <vector> using std::domain_error; using std::sort; using std::vector; // compute the median of a vector<double> double median(vector<double> vec) { typedef vector<double>::size_type vec_sz; vec_sz size = vec.size(); if (size == 0) throw domain_error("median of an empty vector"); sort(vec.begin(),vec.end()); vec_sz mid = size/2; return size % 2 == 0 ? (vec[mid] + vec[mid-1]) / 2 : vec[mid]; }
Student_info.cpp
#include "Student_info.h" using std::istream; using std::vector; // we are interested in sorting the Student_info object by the student's name bool compare(const Student_info& x, const Student_info& y) { return x.name < y.name; } // read student's name, midterm exam grade, final exam grade, and homework grades // and store into the Student_info object // (as defined in S4.2.2/62) istream& read(istream& is, Student_info& s) { // read and store the student's name and midterm and final exam grades is >> s.name >> s.midterm >> s.final; // read and store all the student's homework grades read_hw(is, s.homework); return is; } // read homework grades from an input stream into a vector<double> // (as defined in S4.1.3/57) istream& read_hw(istream& in, vector<double>& hw) { if (in) { // get rid of previous contents hw.clear(); // read homework grades double x; while (in >> x) hw.push_back(x); // clear the stream so that input will work for the next student in.clear(); } return in; }
Header Files
extract_fails.h
#ifndef GUARD_EXTRACT_FAILS_H #define GUARD_EXTRACT_FAILS_H // extract_fails.h #include "Student_group.h" Student_group extract_fails_v2(Student_group&); Student_group extract_fails_v2_5p6(Student_group&); #endif // GUARD_EXTRACT_FAILS_H
grade.h
#ifndef GUARD_GRADE_H #define GUARD_GRADE_H // grade.h #include <vector> #include "Student_info.h" double grade(double, double, double); double grade(double, double, const std::vector<double>&); double grade(const Student_info&); bool fgrade(const Student_info&); #endif // GUARD_GRADE_H
median.h
#ifndef GUARD_MEDIAN_H #define GUARD_MEDIAN_H // median.h #include <vector> double median(std::vector<double>); #endif // GUARD_MEDIAN_H
Student_group.h
#ifndef GUARD_STUDENT_GROUP_H #define GUARD_STUDENT_GROUP_H // Student_group.h #include <list> #include <vector> #include "Student_info.h" // ********************* AMEND THIS *************************** // Solution to Exercise 5-4: Pick one of the followings typedef std::vector<Student_info> Student_group; //typedef std::list<Student_info> Student_group; // ************************************************************ #endif // GUARD_STUDENT_GROUP_H
Student_info.h
#ifndef GUARD_STUDENT_INFO_H #define GUARD_STUDENT_INFO_H // Student_info.h #include <iostream> #include <string> #include <vector> struct Student_info { std::string name; double midterm, final; std::vector<double> homework; }; bool compare(const Student_info&, const Student_info&); std::istream& read(std::istream&, Student_info&); std::istream& read_hw(std::istream&, std::vector<double>&); #endif // GUARD_STUDENT_INFO_H
Test Program
Just simply amend the following line in the main() program accordingly for test purposes.
Student_group students_failed = extract_fails_v2(students); /* Student_group students_failed = extract_fails_v2_5p6(students); */
Performance Comparison
I’ve run a few tests and recorded the job run time for the extract_fail step. from this it looks like the insert/resize method is actually slower than the erase method. But then, I guess it depends on the input file – i.e. depending on the ratio of passing students to failing students, the sequence of the passing and failing students, and potentially other factors.
v2 (erase) | v2_5p6 (insert/resize) | |
10 lines | 0.001 seconds | 0.000 seconds |
100 lines | 0.003 seconds | 0.004 seconds |
1,000 lines | 0.156 seconds | 0.212 seconds |
10,000 lines | 9.79 seconds | 16.6 seconds |
Using an iterator does appear to work, though I believe the insert invalidates all existing related iterator references. Also I didn’t notice that much of a difference in running time on an input of 10k. Interesting differences.
You can use iterators but you’ll need to reinit them after the insert and keep track of where you we’re so you can increment back to where you were.
But this way you can’t actually use list because list doesn’t support indexing, so I’ve been trying to find a way with iterators but I can’t… anyone who could give me a clue on the iterators way?
Here is a solution that uses iterator
// Chapter 5 problem 5.6 : using insert in the beginning and resize the vector instead of using erase
Student_group extract_fails_insert(Student_container& students)
{
Student_group::const_iterator iter = students.begin();
Student_group fail;
using Student_size = Student_group::size_type;
// how many passing students has been inserted
Student_size pass = 0;
// how many students has been read
Student_size counter = 0;
// invariant: elements [0,end)
while(iter != students.end()){
++counter;
// if students pass
if (!fgrade(*iter)){
++pass;
// insert the passing student in the beginning and restart iter from beginning of students
iter = students.insert(students.begin(),*iter);
// key: ignore what has been inserted (pass) and what has been read from students (counter)
for (Student_size i = 0; i != pass + counter; ++i)
++iter;
} else {
fail.push_back(*iter);
++iter;
}
}
// resize to only contain passing students
students.resize(pass);
return fail;
}
Hello Johnny,
First of all : thank you for sharing your work, it is helping me a lot with the book.
About this excercise, I did not have the same interpretation of the question statement.
I think the idea was to insert the students at the beginning of the vector, but not to insert them one by one at the very beginning. Something like :
vector extract_fails(vector& students)
{
vector fail;
int i = 0;
unsigned int successCount = 0;
while (i != students.size()) {
if (fgrade(students[i])) {
fail.push_back(students[i]);
} else
{
students[successCount++] = students[i];
}
++i;
}
students.resize(successCount);
return fail;
}
This leads to an algorithm that is more efficient than the previous two, since no additional allocation is made.
It seems you’re still copying them one by one. I don’t get it.