When a class is first-time accessed in a program, e.g. an object of this class is created or a static method is called, the class definition including one unique copy of its methods is loaded into the memory. Each method has its own memory address.

Then, every time when an object of this class is created, a copy of the data members of this class is created and put in a block of memory. Data members are placed in a fixed order one next to another. The address of an object is the address of the first byte of its memory block.

1.2 Linking of a Method Call

When compiler sees a method call such as

Woman * pw = new Woman;

pw->MakeUp();

it needs to link the individual data copy of the object to the unique implementation of the method. If the method is non-virtual, this task is done at compile time, while if the method is virtual, it is done at run time.

1.3 Memory Footprint of a Non-polymorphic Object

Suppose class Derived inherits from Base1, Base2, Base3, and none of them has virtual functions:

class Base1 {

public:

void Hi1()

{ printf("Hi from Base1!\n"); }

BYTE a1[100];

};

class Base2 {

public:

void Hi2()

{ printf("Hi from Base2!\n"); }

BYTE a2[100];

};

class Base3 {

public:

void Hi3()

{ printf("Hi from Base3!\n"); }

BYTE a3[100];

};

class Derived : public Base1, public Base2, public Base3 {

public:

void Hi()

{ printf("Hi from Derived!\n"); }

BYTE a[100];

};

void main()

{

Derived * pDerived = new Derived;

Base1 * pBase1 = (Base1 *)pDerived;

pBase1->Hi1();

Base2 * pBase2 = (Base2 *)pDerived;

pBase2->Hi2();

Base3 * pBase3 = (Base3 *)pDerived;

pBase3->Hi3();

}

The memory footprint of an object of class Derived and the addresses of the pointers will be like:

pDerived
pBase1 pBase2 pBase3
¯ ¯ ¯

0 100 200 300 400

Fig 1. Memory footprint of a non-polymorphic type

From this footprint you can see some important facts:

1. An object of a non-polymorphic class doesn’t need to carry any information about addresses of the methods, because the linking of non-virtual functions are already done by compiler at compile time. So the object contains purely data members. The references such as pBase1, pBase2, pBase3, etc. are only used to access the data members, not the methods.

2. The memory of base classes is allocated in front of the derived class, which conforms to the sequence of object construction.

3. The address of the FIRST base class is the same as the derived class.

4. When a pointer of the derived class is casted to the base class, it is pointing to the base-class part of the memory.

1.4 Non-virtual Function Call

When compiler sees a call to a non-virtual method:

because only the class which declares the method can implement it, compiler will directly link the call to the address of the specific method at compile time.

1.5 Virtual Function Call Using vPtr and vtable

When compiler sees a call to a virtual method:

void Go(Vehicle * pv)

{

pv->StartEngine();

pv->Accelerate();

}

it has no idea on the addresses of the virtual methods, because when method Go is called at run time, the parameter Vehicle * pv can be passed a pointer to an object of any derived class, such as FamilyCar, 4WD, PickUpTruck, Van, etc., each with its own implementations of StartEngine and Accelerate at different memory locations.

Therefore, there must be a mechanism for a program to figure out the locations of the virtual methods at run time.

Now consider the following virtual function call:

pBase2->Hi2();

A late-binding process involves the following activities:

1. Compiler adds a hidden vPtr member to the class, and generates one unique vtable for the class.

At compilation time, when compiler sees the definition of a class with virtual methods, it will build a virtual table (vtable) for the class, which is an array of function pointers to the implementations of all the virtual methods, and add a hidden data member vPtr to the class definition as the FIRST data member.

Now suppose the methods of classes in Fig. 1 (Hi, Hi1, Hi2, Hi3) are all virtual functions. The memory footprint of an object of class Derived becomes:

pDerived
pBase1                                  pBase2                                      pBase3
¯                                            ¯                                                ¯

vPtr

vPtr2

vPtr3

0 4 104 108 208 212 312 412

a1 are data members of base class 1, a2 are of base class2, …, a are of base class 1.

Fig. 2. Memory footprint of a polymorphic type object

As you can see in the memory footprint, if you use a Base2 pointer to receive a Derived object, for example, this pointer will point to memory offset 104 as pBase2 does.

Note that each Derive object will have its own memory footprint, with the same structure but in different memory locations. However, the vPtrs will all be pointing to the same method implementations, in other words, the vPtr2 of two instances will contain the same address.

The derived-class and the first base class shares the same vPtr, which points to their shared merged vtable (see following section “Inheritance of Base-class vPtrs” for details). The rest of the base classes have their own vPtrs.

Note that no matter how complicated the inheritance hierarchy is, a function pointer in the vtable always points to the latest/lowest implementation of the virtual function in the inheritance hierarchy.

2. Compiler generates code to do dynamic binding using the vtable.

At compilation time, when compiler sees a call to a virtual method thourgh a pointer (pBase2->Hi2( )), it knows that the address of the function is only known at run time, so it will not try to find the implementation of the function. Instead, it knows that the pointer (pBase2) will be pointing to a vPtr at run time. So it generates code to go through the vPtr to find the vtable (whose composition is already know from the type of the pointer), and go to a certain entry of that vtable, fatch that function pointer, and make the call.

3. At run time, when an object is created out of this class definition, its vPtr member will be assigned the address of the class’s vtable.

1.6 A Language Needs Compiler and Type Information to Utilize vtable

To directly link the address of the implementation of a function at compile time (for a non-polymorphic object), or to generate code to find it out at run time (for a polymorphic one), the compiler needs to access the class definition of the pointer, and the type definition of the pointer is all it wants to know. Because of this, even if pBase2 is actually pointing to an object of class Derived, it can not access class Derived’s method Hi, because the code generated by the compiler only knows the vtable of class Base2. For the same reason, the following function call doesn’t work:

void * pvoid = new Derived;

pvoid->Hi();

Scrip languages such as VBScript, JavaScript and early versions of VB do not have a compiler and do not do compilation. They explain and run the code line by line at run time. Therefore, they can not generate code before run to access the vtable. So they can not utilize vtable and can not directly use polymorphism.

1.7 Polymorphic Base Classes are placed at the front

Now suppose only Base2 and Base3 have virtual methods while Base1 doesn’t. The memory footprint of Derived will become:

pDerived
pBase2                                    pBase3                                     pBase1
¯                                              ¯                                                ¯

vPtr

0 4 104 108 208 308 408

Fig. 3. Memory footprint of a polymorphically mixed type

You can see that the memory block of polymorphic base classes (Base2 and Base3) are moved to the front, while the non-polymorphic base classes (Base1) are moved to the back.

1.8 Inheritance of Base-class vPtrs

From the Fig.2 you can see, the vPtr member of the second and third base class was inherited, but the vPtr of the first base class is not. Suppose class D inherits only from C, C only from B, and B only from A. When compiler construct the vtable for D, it directly merges the vtables of A, B and C into D’s, so that class D has only one vPtr pointing to one vtable. The vtable contains function pointers to the lowest implementations of all virtual functions across the inheritance hierarchy, no matter where they are defined.

However, due to multiple inheritance, a class may indirectly inherit from many classes. If we decide to merge the vtables of all the base classes into one, the vtable may become very big. To avoid this, instead of discarding the vPtrs and vtables of all base classes and merging all vtables into one, the compiler only does it to all the FIRST base classes, and retains the vPtrs and vtables of all the subsequent base classes and their base classes.

In other words, in an object’s memory footprint, you can find the vPtrs of all its base classes all through the hierarchy, except for all the “first-borns”.

In Fig. 2, virtual function Hi3 is defined in class Base3. When it is called:

pDerived->Hi3();

even if it is implemented in class Derived, the program will still go to the vPtr of Base3 (offset 208 in the footprint), then to Base3’s vtable, then to Hi3’s function pointer, which points back to the implementation in class Derived. We can prove it by setting Base3’s vPtr in offset 208 to 0. Thus the address of Base3’s vtable is lost, and error will cause the program to be shut down:

pd->Hi3(); // virtual function Hi3 is defined in Base3. Works fine here

::memset( (Base3 *)((DWORD)pd + 208), 0, 4);

pd->Hi3(); // Error happens!

In a typical case, the derived class inherits from a group of base classes, which can be interfaces (classes with only public pure virtual functions) or classes with some implementations. The derived class does not define any new virtual function, it only implements the virtual functions defined in the base classes, or simply group the services of the base classes together. In such a case, the vtable of the derived class is simply the vtable of the first base class. Therefore, the first vPtr in the derived-class object points to the first base class’s vtable, the second vPtr points to the second base class’s vtable, and so on. Everything is clean and neat.

1.9 De-referencing the vPtr

Comparing Fig.2 with Fig.1, because the binding of virtual methods are only done at run time, the object contains not only data members but also vPtrs telling the addresses of all the virtual functions. When you have a reference say pBase2, you know that the first four bytes from the given address is a vPtr of Base2, followed by the data members of Base2.

Base2 * pBase2 = new Derived;

pBase2->Hi2(); // OK!

cout << pBase2->a2[3]; // OK!

pBase2->Hi3(); // Compile error: 'Hi2' : is not a member of 'Base3'

Because of this structure, if you only want to access virtual functions defined in Base2 and its own part of data members, not the virtual functions and data members of other classes, what you really need is the address of Base2’s vPtr.

2. Multiple Inheritance

2.1 Multiple Inheritance and Interface

It is commonly believed that multiple inheritance tends to mass things up. That's why Java language which is a simplified version of C++ forbids multiple inheritance. However, multiple inheritance from interfaces can be very helpful. That's why Java specifically introduced the concept of interface. In C++ interface can be represented by an abstract class with all methods being pure virtual.

An interface groups a number of methods which are logically closely related. It represents one of possibly many aspects of a data type. For example, a Human class may depict many aspects of a person – a person has its psychological aspect, physical aspect, experience aspect, technical skills aspect, hobbies aspect, family aspect, etc. If all methods are grouped in one class, firstable there may be a hundred classes and it may appear to be quite dounting, secondly if another program only wants to deal with one aspect of Human class say the physical aspect such as weight, height, blood pressure, etc., it will have to store a reference to the whole class, and will be able to know about all other unneeded aspects of the class such as the person's technical skills. This is typically not desireable. We want objects to know as little as possible about each other, so that they are loosely coupled.

If we have a group of interfaces each representing one aspect of Human, and let Human inherit from and implement all these interfaces, such as IPsychological, IPhysical, ITechnical, IFamily, then e.g. interface IPhysical will only include methods such as GetHeight, GetWeight, GetBloodPressure, etc. It is logically more simple, and more importantly, if an other program only wants to deal with the physical aspect of Human, it can store a IPhysical reference, then it does not know any other aspects of the Human class. Information hiding is much better achieved.

This mechanism is enforced in COM programming. Clients can never get a reference to the whole class. They only get references to interfaces.

2.2 Virtual Inheritance

When one class inherits from more than one classes, that is called multiple inheritance. Among multiple inheritance, there is a special case called diamond inheritance. One example is: class ostream and istream both inherits from class ios, and class iostream inherits from both istream and ostream.

In polymorphism, when we let a base-class pointer point to a derived-class object, that pointer actually points to the base-class object within the derived-class object. In diamond inheritance, however, because the derived class contains two duplicated base-class objects, compiler doesn’t know which part the pointer should point to. Therefore it will prompt an error message.

Example:

class Base {};

class Derived1 : public Base {};

class Derived2 : public Base {};

class Multi : public Derived1, public Derived2 {};

The following statement is illegal:

Base * ptr = new Multi;

But the following statements are legal:

Derived1 * ptr1 = new Multi;

Derived2 * ptr2 = new Multi;

because the compiler can locate the Derived1 or Derived2 part of object in Multi object.

To solve this problem, use virtual inheritance, which only allows one sub-object:

class Base {};

class Derived1 : virtual public Base {};

class Derived2 : virtual public Base {};

class Multi : public Derived1, public Derived2 {};

int main()

{ Base * ptr = new Multi; }

2.3 Method Name Clash and Ambiguity in Multiple Inheritance

Suppose class Derived inherits from two base classes Base1 and Base2, each has an implemented virtual method Hi with the same signature. If class Derived provides its own implementation of Hi, then the function pointers of method Hi in the two vtables of Base1 and Base2 will both point to Derived’s implementation. Therefore the two different implementations provided by Base1 and Base2 are supressed.

If Derived does not provide its own implementation, the two implementations in the base classes are retained. However, when you call this method through a reference of Derived, no one knows which implementation you want to call. Compiler will complain over this call. You can only call through a base-class vPtr, so that the program knows which implementation you are calling:

#include "stdafx.h"

struct Base1 {

virtual void Hi()

{ printf("Hi from Base1!\n"); }

};

struct Base2 {

virtual void Hi()

{ printf("Hi from Base2!\n"); }

};

struct Derived : public Base1, Base2 {};

int main(int argc, char* argv[])

{

Derived * pDerived = new Derived;

pDerived->Hi(); // Ambiguity!

((Base1 *)pDerived)->Hi(); // Base1::Hi will be called

((Base2 *)pDerived)->Hi(); // Base2::Hi will be called

return 0;

}

3. Templates

3.1 Function Template

Suppose we want to write a function which receives two numbers and prints them out. It may either receive an int or a float or a char. If we use overloaded method, we have to create nine overloaded methods for the combination, such as:

void print(int a, int b)

{ cout << a << b; }

void print(int a, float b)

{ cout << a << b; }

void print(int a, char b)

{ cout << a << b; }

void print(float a, int b)

{ cout << a << b; }

void print(float a, float b)

{ cout << a << b; }

But the implementations in different overloaded methods are identical. There is another more compact way – using method template:

template < class T1, class T2 > // T1 & T2 are called “Type Parameter”.

void print(T1 a, T2 b)

{ cout << a << b; }

Then, when the compiler sees a method call

print(3.0, ‘M’)

it will match T1 with 3.0 i.e. float, and T2 with ‘M’ i.e. char. Then it will generate a brand new method from the template:

void print(float a, char b)

{ cout << a << b; }

and then compile it.

Keyword class here means any user-defined or built-in type.

Method template can also be overloaded, just like overloading normal methods.

You may have noticed that in the method body the type parameter “T1” and “T2” never appears.

3.2 Class Templates

Class templates are also called parameterized types. The principle of class template is the same as method template:

template < class T1, class T2 > // Before class definition!

class Test {

public:

Test(T1, T2);

void set(T1 a0, T2 b0);

void print() const;

private:

T1 a;

T2 b;

};

template < class T1, class T2 > // Before each method!

Test<T1, T2>::Test(T1 a0, T2 b0) // Class name is tied to the type parameters

:a(a0), b(b0) {}

template < class T1, class T2 >

void Test<T1, T2>::set(T1 a0, T2 b0)

{ a = a0;

b = b0; }

template < class T1, class T2 >

void Test<T1, T2>::print() const

{ cout << a << " " << b << endl; }

int main()

{

Test<float, int> t1(5.0, 5); // When creating object

Test<char, float> t2('X', 5.0);

t1.print();

t2.print();

}

Suppose “Test” is the name of a class template with type parameter “X”. When you declare new object of class “Test” in the class definition of another class, always use the following format:

Test<X> t1;

When creating a new object, use the following format:

Test<int> t1;

Test<float> t2;

3.3 The Power of Type Extensibility with Class Template

Class template enables you to write a class which suits all types. Because user-defined types (classes) can also use the same operators such as “=”, “= =”, “<<”, “>>” “+”, “-” etc. as primitives, virtually any user-defined type can be used in any class template. In other words, when we are writing a template, virtually we shouldn’t pay any attention to whether the operand is a built-in type object or a class object.

3.4 Non-Type Parameters in Class Template

For class templates, not only type can create different template classes, but also non-type parameters. Any parameter can be used. Because only memory size must be fixed at compile time, all other parameters or variables used in a program can be decided at run time, in real cases only memory size parameters are used as non-type parameters to create template methods. Non-type parameters can have default arguments.

template<class T, int size = 1>

class Test{

private:

T array[size];

public:

Test();

void print() const;

};

template<class T, int size>

Test<T, size>::Test()

{ for(int i=0; i < size; i++)

cin >> array[i]; }

template<class T, int size>

void Test<T, size>::print() const

{ for(int i=0; i < size; i++)

cout << array[i] << endl; }

int main()

{ Test<int, 3> a1;

Test<float, 5> a2;

a1.print();

a2.print();

}

The use non-type parameter to create template classes with different memory sizes reminds us of using dynamic memory allocation. To create array in a normal way requires you to determine the memory size when writing your class; by using non-type parameter you can let the client who uses your code decide it; by using dynamic memory allocation you can decide the memory size at run time. So the flexibility is different.

3.5 Templates and Friends

Inside a class template X that had been declared with

template < class T>

class X {

...

};

There are the following ways of friendship declaration:

1. friend void f1( );
f1( ) is a friend of all template classes.

2. friend void A::f1( );
method f1( ) of class A is a friend of all template classes.

3. friend void f1( X<T> & );
f1( X<float> & ) will be a friend of class X<float> only, and f1( X<int> & ) will be a friend of class X<int> only.

4. friend void C<T>::f1( X<T> & );
method C<float>::f1( X<float> & ) will be a friend of class X<float> only. “C” is the name of another class.

5. friend class Y;
every method of class Y is a friend of all template classes.

6. friend class Y<T>;
class Y<float> is a friend of class X<float> only.

4. Exception Handling

4.1 What is Exception Handling

First consider a normal way to handle errors:

const float reciprocal( const float denominator )

{

if(denominator == 0)

{

cout << “Divide by zero! \n”; // Exception handling code

abort();

} // header file is <stdlib>

return 1/denominator;

}

This “reciprocal( )” method calculates the reciprocal of a given float number. The if block forms an exception handler.

The problems with such normal way of error handling are:

1. When the exception handling code becomes more complicated, the normal part of program becomes “polluted” with the exception handling code, and makes it more difficult and complicated to understand and maintain.

2. When you write a program you have to handle the errors yourself, or you should invoke a known class to handle it. There is no way that you can just detect the error, “publish” the error message, so that the client who makes use of your code can handle it in his own way.

The second reason is the major reason which calls for exception handling technique. There are cases when the place where the error happens is not the right place to handle it. Instead it is more appropriate to notice the caller of the method and let him handle it.

4.2 Do Not Use Exception Handling on Normal Application Code

There are the following reasons that exception handling shouldn't be used on normal application code:

1. Exception handling is designed for error handling which very seldom happens. So the compiler is not required to optimize it well as it does on normal application code.

2. Flow of control with conventional control structures is generally clearer and more efficient than with exceptions.

3. When exceptions happens, the stack is unwound and resources allocated prior to the occurrence of the exception may not be freed. This problem can be avoided by careful programming.

4.3 try Block

A try block contains the code which may throw the exception, or contains the code that calls the method which may throw the exception.

4.4 throw

A throw keyword specifies an operand which can be of any type. The throw statement is actually passing information to the exception handler in the catch block. This information can be either only be the type of the thrown object – in many cases it is enough, or plus the extra information carried in the thrown object. The thrown object can be a class object or just a character string:

int main()

{

try

{ throw ("Testing string!"); }

catch (char * s)

{ cout << s << endl; }

}

4.5 Throw Point

The point where the exception is thrown is called throw point. When an exception is thrown, the control leaves the try block and searches for a catch block which can match this exception. If one catch block catches the exception, the code included will be executed, then the control will resume to the first statement after the last catch block. All codes between the throw point and the last catch block will be skipped. The control has no way to return to the throw point.

If the exception is not caught, the method call stack is unwound and control will further more exit to the outer scope and continue searching the following catch blocks. If it is already the most outer block such as main, method terminate is called, which by default will call method abort.

4.6 catch Block

¨ Parameter Name in catch header

A catch keyword specifies in “( )” the type of the object to be caught, and an optional parameter name.

When exception happens, a temporary copy of the thrown object is created and initialized. If the catch block has defined a parameter name of that type, this name will be given to this temporary object and used to access the data members and methods of the object. If the catch block does not have an parameter name, then the thrown object’s internal information can not be accessed, and it is only used to match the catch block. After handler finishes, the temporary object is destroyed.

So if you want to pass addtional information from the thrown point to the catch block, you have to put this information inside the thrown object (as its data member) and define an object name in the catch block, so that you can access the information through it.

¨ catch all

The following catch specification

catch(...)

will catch all exceptions.

¨ catch through inheritance

Due to inheritance hierarchies, a derived-class object can be caught by a handler specifying either the derived-class type or the base-class type.

As a general rule, put the specific type handlers in the front, such as derived-class type handlers, and put the generic handlers after, such as base-class type handlers, and put “catch (... )” at the last. If you reverse the sequence, the exceptions will all be caught by generic handlers and will never get to the specific ones.

A “void *” handler will catch all exceptions of pointer type.

¨ Release the Resource in catch Block

When an exception happens, the try block terminates and all automatic objects inside the try block are destroyed before the handler begins executing. However, it is possible that some allocated resources are not yet released in the try block. The catch handler, if possible, should release these resources, e.g., delete the space allocated by new, and close any files opened in the try block.

¨ Copy Constructor of the Thrown Object

The handler must have access to the copy constructor of the thrown object. It means that the header file of the thrown object should be included.

4.7 Rethrow an Exception

A catch handler itself can discover an error and throw an exception. Or it may do some primary handling and then throw it again. To rethrow, simply say

throw;

A rethrown exception will exit the outer try block and search the following handlers.

“catch (... )” can be used to perform recovery that doesn’t depend on exception types, such as releasing common resources. Then it may rethrow the exception to alert more specific catch handlers.

4.8 Exception Specifications

An exception specification (also called throw list) is throw followed by a list of types enclosed in "( )":

void print() throw(a, b, c)

{...}

It can be placed after any method header, to restrict the kind of exceptions that can be thrown by this method. This is used to notice the client who make use of this code the types of possible exceptions your code may throw.

All derived-class types based on the throw list types can also be thrown.

If any exception not listed in the throw list is thrown, method unexpected is called, which will call the method specified by method set_unexpected, or by default it will call method terminate.

Method terminate will call the method specified by method set_terminate, or by default it will call method abort.

Header files of these methods

unexpected, set_unexpected: <unexpected>

terminate, set_terminate: <terminate>

abort: <stdlib>

Method set_unexpected and set_terminate take method names i.e. function pointer as arguments. The methods must be with void return type and no arguments. Method set_unexpected and set_terminate returns a the name of the method last called by terminate or unexpected. This enables the programmer to save the name and use it later.

If the use-defined method specified in set_terminate or set_unexpected does not exit the program, method abort will be automatically called to end the program execution at last.

If you include exception "std::bad_exception" (derived from base class "exception" whose prototype is in header file <exception>) in the throw list of a method, when unexpected throw happens, method unexpected will throw "bad_exception" instead of calling terminate by default or user-defined method specified by method set_unexpected.

An empty throw list "throw" means that the method wouldn't throw any exception without alerting method unexpected.

4.9 Stack Unwinding

When an exception is thrown but not caught in a certain scope, the method call stack is unwound , which means that the method which throwns the exception terminates, all local objects destroyed, then control returns to the point where this method was called.

If that calling point is in a try block, control will leave that try block and search the following handlers. If that point is not in a try block or the search fails, stack unwinding will happen again -- until it reaches main, and method terminate will be called, as said before.

#include <iostream>

#include <stdexcept>

using namespace std;

void method3() throw(runtime_error)

{

throw runtime_error( "Thrown by method 3!" );

cout << "In method3: shouldn't appear! \n";

}

void method2() throw(runtime_error)

{

method3();

cout << "In method2: shouldn't appear !\n";

}

void method1() throw(runtime_error)

{

method2();

cout << "In method1: shouldn't appear !\n";

}

int main()

{

try { method1(); }

catch ( runtime_error e )

{ cout << "Exception caught: " << e.what() << endl; }

cout << "In main: should appear! \n";

int a;

cin >> a;

}

As a general result, if you don’t provide any exception handling code and exception happens, the program will be terminated. But if you provide a proper exception handling measure, the program can go on working.

4.10 Exception Thrown from Constructor and Destructor

When an exception is thrown in a constructor, destructors for all the local objects created as part of the object being constructed will be called. If a destructor which is called to unwind the stack throws out an exception, method terminate will be called.

Normally you should avoid throwing an exception out of a constructor without catching it and doing necessary clean up in the constructor first. A simple solution is to use default constructor to create a very "safe" model of object, then use another method which is typically called "Init" in many applications to initialize it.

The following example demonstrates why you should avoid throwing exception out of constructor without catching and processing it first. Suppose in the constructor before the throw point you have allocated some dynamic memory with operator new to a pointer data member:

class Employee {

...

Image * const m_pImage;

AudioClip * const m_pAudio;

};

Employee(..., string image, string audio) : ..., m_pImage(NULL), m_pAudio(NULL)

{

...

m_pImage = new Image(image);

m_pAudio = new AudioClip(audio); // throw point

}

When the exception is thrown by "new AudioClip(audio)", the constructor is exited, the dynamic memory become memory leak. The destructor of the class is not called, because a destructor is only called for a fully constructed object, and because the constructor is half-way termicated by the throwing of exception, the object is not fully constructed.

The correct way to deal with such situation is to catch possible exception in the constructor, and in the catch block, delete all dynamic allocated resources, then rethrow the exception again as a notification:

Employee(..., string image, string audio) : ..., m_pImage(NULL), m_pAudio(NULL)

{

...

try {

m_pImage = new Image(image);

m_pAudio = new AudioClip(audio);

}

catch(...)

{

delete m_pImage;

delete m_pAudio;

throw;

}

This solution works fine except if the class uses constant pointer members:

class Employee {

...

Image * const m_pImage;

AudioClip * const m_pAudio;

};

Such pointer members must be initialized in the member initializer, where you can not put try and catch blocks.

The ultimate solution is: use auto_ptr< > class or similar classes to wrap the raw pointers, so that they become local objects:

class Employee {

...

auto_ptr<Image> m_image;

auto_ptr<AudioClip> m_audio;

};

As we already know, when a constructor is terminated by exception, all local objects are destroyed by calling their destructors. The destructor of auto_ptr< > will delete the object pointed by the wrapped pointer.

4.11 Resource Leak

If a resource is allocated before the throw point and supposed to be released after the throw point, when exception happens it prevents the resource from being released. This is called resource leak. If the resource is memory, it is called memory leak.

One technique to resolve this problem is to create a local object with a pointer which points to this resource, and define a proper destructor for this object to delete the resource. Then when exception happens, destructors for this object will be called, and the resource will be released.

C++ has already provided a class template, auto_ptr, which contains a pointer pointing to a given type. The destructor of this class destroys the resource to which the pointer points to. Operator "->" and "*" have been overloaded, so that auto_ptr objects can be used as normal pointers to access and dereference. For example,

auto_ptr<int> ptr = new Employee("Frank Liu");

ptr will be treated as a local object and its destructor will be automatically called when leaving scope, in which there is usually the following statement

delete ptr;

Header file of auto_ptr is <memory>.

4.12 Insufficient Memory when Using new

When new fails to obtain memory, older-version compilers will return 0. The ANSI/ISO C++ draft standard version specifies that when new fails, it automatically throws out a "std::bad_alloc" exception defined in header file <new>. If you still want new to return 0, you have to put "(nothrow)" after new:

int * ptr = new (nothrow) int[500];

If you pass the name of a user-defined function (which has void return type and no arguments) as an argument to method "set_new_handler", whose header file is <new>, then new will not throw the "std::bad_alloc" exception. Instead it will call that handler repeatedly until it sucessfully allocates memory. You can even overload operator new and "set_new_handler" for the class in question. For details, see Ref1.

4.13 Standard Library Exception Hierarchy

Experience shows that exceptions fall nicely into a number of categories. The C++ draft standard includes a hierarchy of exception classes. The base class is "exception", whose header file is <exception>, which offers the method "what" to access the appropriate error message.

Exception class "runtime_error" and "logic_error" are immediately derived from "exception". Their header file is <stdexcept>. Each of them has several derived classes.

From "logic_error" there are:

· invalid_argument invalid arguments are passed to a method

· length_error a length larger than the maximum size of the object is used to that object.

· out_of_range a value is out of range, such as a subscript to an array.

From "runtime_error" there are:

· overflow_error

· underflow_error

Also derived from class "exception" are the exceptions thrown by C++ language features:

· bad_alloc thrown by new when it fails to obtain memory

· bad_typeid

· bad_cast thrown by dynamic_cast.

4.14 Difference Between C++ and Java’s Exception Handling

The biggest difference is: Java compiler forces that all exceptions should be caught and handled.

5. Data Structures

5.1 Data Structure

A collection of the same type of objects is called a data structure.

5.2 Linked List

If you want to maintain a collection of the same objects, you may first think of an array of that type of objects. But as we already know, the size of the array have to be decided before compiling. Then we may think of using dynamic memory allocation to create the array, so that we can decide the memory size at run time:

Test * array = new Test [size];

But still there is limitation: once the array is created, you can not vary the size, unless you delete the whole array and create a new one. You can't expand or shrink the list dynamically.

In order to put a group of objects into a flexible data structure, you can use wrapper objects to wrap around the original objects, with some additional pointers pointing to each other. Through these pointers, all the wrapper objects can be linked together. Such a pointer-linked collection is called “linked list”.

The disadvantages of linked list is that you can not directly access one node with subscript like array. You have to find the first one, go through the list through the pointers, until you reach the one you want. The other thing to remember is: nodes in linked list are created dynamically. They are not as fast as automatic objects.

If in a data structure each of the nodes is only connected to one node downstream and one node upstream, such a structure is called a "linear linked list".

There are "open-loop" lists, each has a start node and end node. The pointer of the end node is always set to 0 to mark the end. There are also "close-loop" lists, whose start node and end node are connected together forming a circular loop of nodes.

There are Singly-linked lists whose nodes only contain one pointer pointing to the downstream node. You can't traverse upstream. There are also doubly-linked lists whose nodes each contains a pointer to both the upstream and downstream node. You can traverse on both directions.

Using pointers to link objects you can create lists as queues, which only removes a node from the front and adds a node to the back, or stacks, which only adds and removes nodes to and from the front, or any other types.

5.3 Tree

A tree is a structure whose nodes each connected to one upstream node and many down-stream nodes. A binary tree is a structure which is connected to one upstream and two downstream nodes. A tree has a root node.

5.4 Struct

A structure is a simplified class with all public members:

struct Card {

char * face;

char * suit;

};

Structure members are not necessarily stored in consecutive bytes of memory. If the word length of the computer is 2 bytes, and one data member is only one-byte long, there will be a one-byte hole between this member and the next member. Because the value stored in the hole is undefined, a structure can not be compared.

A structure is initialized the same way as an array:

Card c1 = {“Three”, “Hearts”};

If the initialization list contains not enough initializers, the rest will be initialized to 0.

5.5 Bitwise Operators

Up to now data is regarded as a whole number without considering its bit status. Internally a number is represented by binary bits. C++ provides the following bitwise operators:

1. &: bitwise AND. One bit is 1 if the corresponding bits of the both operands are both 1.

2. |: bitwise inclusive OR. One bit is 1 if one of both corresponding bits of the two operands is/are 1.

3. ^: bitwise exclusive OR. One bit is 1 if the corresponding bits of the two operands are different: one is 1 and one is 0

4. <<: left shift. Shifts the bits of the first operand left by the number of bits specified by the second operand; fill from right with 0. For example, 0001 << 2 will be 0100. Result will be undefined if the second operand is negative.

5. >>: right shift. Shifts the bits of the first operand right by the number of bits specified by the second operand; The way of filling the left is machine-dependent, some use 0 and some use the sign bit.

6. ~: complement. All 0 bits are 1 and all 1 bits are 0. For example, ~0101 will be 1010.

7. Assignment operators: &=, |=, ^=, <<=, >>=. For example: a &= b means a = a & b, a >>= b means a = a >> b.

5.6 Example: Display a Number in Binary Form

void BinaryDisplay1(unsigned value)

{

mask = 1 << 15; // 1000000000000000

for (int c = 1; c <= 16; c++)

{

cout << (value & mask ? ‘1’: ‘0’); // 1000000000000000 is regarded as true

if(c % 8 == 0)

cout << ‘ ‘;

}

cout << endl;

}

5.7 Bit Field

C++ provides the ability to specify the number of bits in which an unsigned or int member of a class or a structure (or a union) is stored. Such a member is called a bit filed member:

struct Card {

unsigned face : 4;

unsigned suit : 2;

unsigned color: 1;

};

Therefore, three members are stored in one byte. The number of bits of one member is decided by the maximum number it will represent. For example, the number of faces of cards is 13, less than 16, so member “face” can be defined with four bits.

Bit field members are accessed the same way as normal members. But they can not have addresses, and “&” can not be used, because there is no unique address of each part of one word.

Unnamed bit field with a width is used as padding, nothing can be stored in these padding bits:

structure Example {

unsigned a : 13;

unsigned : 3;

unsigned b : 4;

};

Unnamed bit field with 0 width is used to align the next bit field with next word:

structure Example {

unsigned a : 13;

unsigned : 0;

unsigned b : 4;

};

Although bit fields save space, it consumes more machine time, because extra codes have to be generated to deal with them. This is one of the many examples of space-time trade off that occur in computer science.

5.8 Character Handling Library

Characters are often manipulated as integers. Header file of the character handling library is <ctype.h>.

1. int isdigit(int c): returns true if c is a digit -- '0' to '9'.

2. int isxdigit(int c): returns true if c is a hex digit -- '0', '1',..., 'E', 'F'.

3. int isalpha(int c): returns true if c is a letter.

4. int isalnum(int c): returns true if c is a letter or a digit.

5. int islower(int c): returns true if c is a lower case

6. int isupper(int c): returns true if c is a upper case

7. int tolower(int c): if c is upper case then return c in lower case, otherwise return c unchanged

8. int toupper(int c): if c is lower case then return c in upper case, otherwise return c unchanged.

9. int isspace(int c): returns true if c is a whitespace character: newline('\n'), space, form feed('\f'), carriage return('\r'), horizontal tab('\t'), or vertical tab('\v').

10. int iscntrl(int c): returns true if c is a control character.

11. int ispunct(int c): returns true if c is a printing character other than a space, a digit or a letter.

12. int isprint(int c): returns true if c is a printing character including space.

13. int isgraph(int c): returns true if c is a printing character other than space.

5.9 String Conversion Methods

String conversion methods are in <stdlib.h>.

1. double atof(const char *): converts a numbered string to double. E.g.: if ptr = "23.85", double 23.85 will be returned.

2. int atoi(const char *): converts a numbered string to integer. E.g.: if ptr = "2593", integer 2593 will be returned.

3. long atol(const char *): converts a numbered string to long integer.

4. double strtod(const char *, char **): converts the double portion of string ptr1 to double, and locate the second pointer at the first remaining character in the first string. E.g.: if ptr1 = "1234.2ABCD", double 1234.2 will be returned and second pointer will be "ABCD".

5. long strtol(const char * ptr1, char ** ptr2, int base): converts the long int portion of string ptr1 to long integer, and locate pointer ptr2 at the first remaining character in ptr1. The long integer can be in any base: if base = 0, then it can be decimal, octal or hexadecimal; if base = 8, it is octal; if base = 10, it is decimal; if base = 16, it is hexadecimal. But the base can be any value from 2 to 36, limited only by the number of Latin letters A-Z. Using NULL for the second argument will cause the remaining portion be ignored.

6. unsigned long strtoul(const char * ptr1, char ** ptr2, int base): only difference with method strtol is the integer is unsigned.

5.10 String Handling Library “string.h”

¨ Searching character in string

1. char * strchr(const char * s, int c): locate the first c in string s, or return NULL if not found.

2. size_t strcspn(const char * s1, const char * s2): return the length of the initial segment of s1 which consists of characters not included in s2.

3. size_t strspn(const char * s1, const char * s2): return the length of the initial segment of string s1 which consists only characters in s2.

4. char * strpbrk(const char * s1, const char * s2): locate the first occurrence in s1 of any character contained in s2 or NULL if not found.

5. char * strrchr(const char * s, int c): locate the last occurrence of c in s or NULL if not found.

6. char * strstr(const char * s1, const char * s2): locate the first occurrence in s1 of s2 or NULL if not found.

¨ Memory manipulating methods

The basic unit of memory is byte, and a byte can be regarded as a character. The following methods manipulate memory buffers in characters. The pointer parameters in these methods are declared "void *". Because pointers of any type can be directly assigned to a pointer of type "void *", therefore methods can receive pointers of any type.

w void * memcpy(void * s1, const void * s2, size_t n)

Copies n characters from buffer s2 to s1, and return a pointer to the resulting buffer.

w void * memmove(void * s1, const void * s2, size_t n)

The same as memcpy, but the characters are first copied from s2 into a temporary location, then copied to s1. This allows to copy part of one object and overlap with itself:

int main()

{

char x[] = "123 456 789";

cout << (char *)memmove(x, x+4, 3);

}

then x will become "456 456 789".

w int memcmp(const void * s1, const void s2, size_t n)

Compares the first n characters of object s1 and s2, and returns 0, <0, or >0.

w void * memchr(const void * s, int c, size_t n)

Locates the first occurrence of c in the first n characters of object s, or return 0 if not found.

w void * memset(void * s, int c, size_t n)

Set the first n characters of object s to character c, and return a pointer to the result:

int main()

{

char x[] = "AAAA";

cout << (char *)memset(x, 'B', 2);

}

result will be "BBAA".

5.11 Error Message Method

char * strerror(int code);

Error messages have their codes. This method maps the error code into a full text string in a system dependent way, and returns a pointer to that mapped message.

6. The Preprocessor

6.1 #include

The #include preprocessor directive causes a copy of a specific file to be pasted in place of the directive before compilation. It is used when another file contains the type definition that you want to use. For global variables and functions contained in another file, you do not include that file. Instead you inform the compiler with the following line before you use it that it is a global variable or function and the compiler should find it out itself:

extern ULONG m_refCount; // global variable

extern void Lock(BOOL fLock); // global function

There are two forms of the #include directive:

#include <filename>

#include “filename”

File included in “< >” will be searched in the system’s pre-defined locations, while the file included by quotes will be first searched in the current directory, then the pre-defined locations. “< >” is used for standard library files.

Most included files are *.h header files. But other files such as *.cpp source files can also be included, as long as the definition you want to invoke.

If header file A makes use of a type defined in file B, normally it should include B before using that type. Such a file can be called “self-contained”. Not all files are “self-contained”. Some library header files are very comprehensive and contains lots of type definitions. Different definitions may use other definitions defined in other files. If we want such files to be self contained, we may virtually end up with each header file including all other header files. Considering that most clients include such a header file just to use one or two definitions in it, it will be a big waste of space. Therefore, such a header may decide not to include some other files. If you want to include use type A defined in file X, and type A uses type B which is defined in file Y, it is your duty to insure that file Y is included in front of file X:

// ******* header1.h *********

#define _STRUCT_A

struct A {

int a;

};

// ******* header2.h *********

#define _STRUCT_B

struct B {

int b;

};

// ******* header3.h **********

#if defined _ STRUCT_A

struct C {

int c;

A a;

};

#endif

#if defined _ STRUCT_B

struct D {

int d;

B b;

};

#endif

// ******** main ***********

// header1 must proceed header 3, while header 2 is not necessary

#include "header1.h"

#include "header3.h"

void main(int argc, char* argv[])

{

C c;

}

6.2 Conditional Compilation

#if expression

...

#endif

Two special expression are:

#if !defined (SYMBOL)

#if defined (SYMBOL)

Their simplified forms are:

#ifdef

#ifndef

Conditional compilation is commonly used as a debugging aid. You can first use

#define DEBUG 1

then enclose all debugging codes in

#if DEBUG

#endif

Then after debugging is finished you only need to change “1” to “0” to disable all debugging codes.

6.3 #define Symbols for Conditional Compilation

The #define preprocessor directive creates a file-scope symbolic constant. When preprocessor sees

#define <constant>

it will create a constant named "CONSTANT" in the scope of this file. Then "defined CONSTANT" will return true. Therefore, you can use the following predirectives to wrap some code:

#if !defined _ANY_SYMBOL // “#ifndef _ANY_SYMBOL” is a simplified version

#define _ANY_SYMBOL

...

#endif

The first time preprocessor sees a #include and opens this file, _ANY_SYMBOL is not defined, so the preprocessor defins _ANY_SYMBOL and goes on. Next time when preprocessor sees another include of this file and encounters the #if, it will skip the whole block. This is how you can avoid the enclosed code from being included for more than once, which may cause serious problems.

The constant can be anything. In convention we use the capitialized file name. For example, if the header file is "Employee.h", the constant may be "EMPLOYEE_H", "_EMPLOYEE", "_EMPLOYEE_H", etc. Sometimes a globally unique ID may be used so that there won't be any confusion.

A file-scope constant applies only to the code after this. To undefine it, say

#undef <constant>

A preprocessor constant can also be defined in a global scope. In Visual C++, the constants you put in the "Project | Settings | C/C++ | Preprocessor directives" field will be defined in global scope. Some commonly used constants like "WIN32" and "_DEBUG" are defined in that field. When you choose "Win32 Debug" in "Project | Settings", the "_DEBUG" constant will be added automatically. If you choose "Win32 Release", constant "NDEBUG" will be added. Then lots of Win32 macros such as assert and your own code may check this constants and decide what to behave. For example, if "NDEBUG" is defined, the code within assert will be skipped to increase performance:

void assert(...)

{

#if defined NDEBUG

// assertion work to be done

#enif

}

Look at the following example. Suppose you have two classes defined in two different header files

class A

{

public:

A()

{

#if defined _SAYHELLO

printf("Hellow from class A!\n");

#else

printf("Hi from class A!\n");

#endif

}

};

class B

{

public:

B()

{

#if defined _SAYHELLO

printf("Hellow from class B!\n");

#else

printf("Hi from class B!\n");

#endif

}

};

If you put "#define _SAYHELLO" in one of the header files, only one constructor will say "Hello" and the other will say "Hi". If you put "_SAYHELLO" into the project setting, both constructors will say "Hello".

6.4 #define for Constants vs. C++ Constants

#define MAX_POWER 15.3

#define OVERFLOW_MSG "Data overflow!"

#define constants such "MAX_POWER" can not be seen by compiler. It is C-style. C++ style constants would be using constants. Specifically, if you want to restrain the constant within the scope of a class, use a constant static class member:

private:

const static int I;

In the implementation file, you have to initialize it as

const int A::I = 3;

However, if you want to use I as array size, compiler has to know its value, and the above way does not work. You can use a "enum hack":

private:

enum {I = 5};

float a[I];

6.5 #define for Macros vs. C++ Inline Template Functions

Macros are C-style. It has two advantages: it can be applied to different types and it saves a method call. Its disadvantage is that there is no type checking. In C++ it can be replaced by template and inline methods which has all the benefits of macros and meanwhile performs type checking.

#define area(r) (3.1416 * (r) * (r))

“( )” around the argument r and the entire expression forces the proper order of evaluation when the argument is an expression.

Then when preprocessor sees a line like

area = 3.3 - area(c + 2);

it will just replace it with

area = 3.3 - (3.1416 * (c + 2) * (c + 2));

Such a C-style macro can be replaced by a C++ inline template function with all macro benefits but without macro drawback:

template <class T>

inline const T & Area(const T & r)

{

return 3.1416 * r * r;

}

If the replacement text in a macro is longer than a line, backslash “\” must be put at the end of the line to indicate that the replacement text continues in the next line:

#define check(msg) if(FAILED(hr)) \

::MessageBox(NULL, msg, "Server", MB_OK | MB_TOPMOST); \

return; \

¨ #define macros with stringizing operator #

In the macro body, if you put # operator in front of the parameter, the preprocessor will wrap the parameter with double quotes. So if a macro defines as

#define m(msg) ::MessageBox(NULL, msg, "Server", MB_OK | MB_TOPMOST);

if you say it will become

m(buf);

It will become

::MessageBox(NULL, buf, "Server", MB_OK | MB_TOPMOST);

If the macro is

#define m(msg) ::MessageBox(NULL, #msg, "Server", MB_OK | MB_TOPMOST);

if will become

::MessageBox(NULL, "buf", "Server", MB_OK | MB_TOPMOST);

¨ #define macros with token pasting operator ##

## is used to join two tokens together. So if the macro is defined as

#define m(x) parameter##x

then if you say

cout <<m(12);

it becomes

cout << parameter12;

6.6 Line Numbers

#line 100

#line 100 “file1.c”

The first line causes the following line numbers to start from 100, and all compiler messages is written into “file1.c”. It is used to make the messages produced by syntax errors and compiler warnings more meaningful.

6.7 Predefined Symbolic Constants

They start and end with two underscores:

__LINE__ line number of current source line – an integer

__FILE__ presumed name of the source file – a string

__DATE__ the date the source file is compiled – a string such as “Jan 19 1999”

__TIME__ the time the source file is compiled – a string such as “15:25:35”

__STDC__ integer constant 1. This is intend to indicate that the implementation is ANSI compliant

7. C - like Code

7.1 Redirecting Input & Output on UNIX/DOS Systems

Normally a program’s input comes from keyboard and output goes to screen. In most of the computer systems – UNIX and DOS systems in particular – it is possible to redirect inputs to come from a file, or redirect outputs to a file. They can be achieved on the system command line.

¨ Redirect Input and pipelining

Key in the following on the command line in either UNIX or DOS:

$ program-name < filename

it runs the program with its input directed to the file. "<" is called the redirect input symbol.

The other method to redirect input is pipelining. A pipeline "|" causes the output of one program to be directed as the input to another program:

$ program1 | program2

This causes the output of program 1 to become the input of program 2.

¨ Redirect Output

$ program-name > filename

it runs the program, with its output redirected to the file. ">" is called the redirect output symbol.

Using the append output symbol ">>", the program output can be appended to the end of one file:

$ program-name >> filename

7.2 Variable-Length Argument List

This feature is a C feature. In C++ we use method overloading.

To indicate that a method receives an unspecified number and type of arguments, put "..." in the end of the argument list. The header file of the macros and definitions is in <stdarg.h>.

int sum(int i, ...)

{

int sum = 0;

va_list a;

va_start(a, i);

for (int k = 1; k <= i; k++)

sum += va_arg(a, int);

va_end(a);

return sum;

}

int main()

{

int a = 1, b = 2, c = 3, d = 4;

cout << sum(1, a);

cout << sum(2, a, b);

cout << sum(3, a, b, c);

cout << sum(4, a, b, c, d);

}

va_list is a type. a is an object of va_list type, which will be used by other macros to get each argument from the list.

va_start initializes the va_list object, taking 2 arguments: the ap1 object and the last argument before the "...".

va_arg(a, int) takes two arguments: the va_list object and the type of the object expected in the argument list, and returns the next argument in the argument list.

va_end(ap1) facilitate a normal return to the caller.

7.3 Command-Line Arguments

In many systems – UNIX and DOS in particular – it is possible to pass arguments to main from a command line. This is achieved by including two parameters in main's parameter list: integer argc indicates the number of passed arguments, and argv is an array of strings to hold the arguments. This method can be used to pass information to a program, such as options and filename.

Command line:

$ program-name file1 file2

Program:

int main(int argc, char * argv[])

{

if(argc != 3)

cout << "Wrong arguments! \n";

else

{

ifstream file1(argv[1], ios::in);

ofstream file2(argv[2], ios::out);

...

}

7.4 Program Termination with Exit and Atexit

The general utilities library <stdlib> provides methods to terminate program execution other than conventional return from main.

Method exit forces a program to terminate as if it executes normally. It is often used when an error is detected in input or file can not be opened. It has one argument, which is normally one of the two symbolic constants: EXIT_SUCCESS and EXIT_FAILURE. This argument is returned to the calling environment.

Method atexit takes a method name as argument. It registers this termination method to the program, so that when the program has a successful termination – either when return from main or exit called, that termination method will be called. This method can not have arguments or return type.

7.5 Type Qualifier volatile

Type qualifier volatile is used on the definition of a variable, to notify the compiler that this variable may be altered from outside the program so it is not under complete control of the local program. Thus, compiler can not perform optimization on it.

7.6 Suffixes for Primitives

u or U unsigned. 174U

l or L long integer or long double 35L, 78.3L

ul or UL unsigned long 2387UL

f or F float 33.2F

7.7 Signal Handling

Unexpected events, also called signals, can terminate a program prematurely. For some of these signals, symbolic constants had been defined in signal handling library <signal.h>:

1. SIGABRT Abnormal termination of the program such as a call to abort

2. SIGFPE An erroneous arithmetic operation such as divide-by-zero or overflow

3. SIGILL Detection of an illegal instruction

4. SIGINT Receipt of an interactive attention signal

5. SIGSEGV An invalid access to storage

6. SIGTERM A termination request sent to the program

When the event happens, the corresponding signal happens.

The signal handling library provides method signal to catch the signal. It has two arguments: the name of the signal and the name of a signal handling method. It registers a signal handling method for a specific signal to the program, so that when that signal happens, that corresponding method will be called to handle that signal.

#include <iostream>

#include <signal>

void handling(int signal)

{ cout << "Signal SIGFPE is trapped! \n"; }

int main()

{

signal(SIGFPE, handling);

int a = 3, b;

cout << "Please input b as 0: \n";

cin >> b;

cout << a/b << endl;

}

7.8 C-Style Dynamic Memory Allocation

In addition to malloc and free, the general utilities library <stdlib> provides two other methods for dynamic memory allocation – calloc and realloc, to create and modify dynamic arrays.

Method calloc dynamic allocates memory for an array:

void * calloc(size_t number, size_t size);

the first argument is the number of elements and the second is the size of each element. Altogether it determines the memory size needed. All the elements are initialized to 0. The method returns a pointer to the allocated memory, or NULL if memory is not allocated.

Method realloc changes the size of an array created by calloc:

void * realloc(void * ptr, size_t size);

the first argument is a pointer to the original object, second is the new size. The method returns either a pointer to the new memory or NULL if not successful. If the memory is reduced, the untouched part of memory remain unchanged.

If ptr is 0, it works exactly as calloc. If size is 0, all the memory is freed.

7.9 Unconditional Branch: Goto

There are mainly two situations that you need to use this unstructured programming technique:

1. Performance is more important than the structure of the program;

2. To exit nested control structures.

Try not to use goto except for this two reasons. Unstructured programming can lead to programs which are more difficult to debug, maintain and modify.

A goto statement is used together with a label. A label is an identifier followed by a colon.

int main()

{

fstream file("test.txt", ios::out);

for (int i = 0; i < 4; i++)

{

file << "Outer loop: i = " << i << endl;

for (int j = 0; j < 4; j++)

{

file << "Inner loop: j = " << j << endl;

if(i == 2 && j == 2) goto bottom;

}

file << endl;

}

bottom: file.close();

return 0;

}

Output will be:

** Outer loop: i = 0

Inner loop: j = 0

Inner loop: j = 1

Inner loop: j = 2

Inner loop: j = 3

** Outer loop: i = 1

Inner loop: j = 0

Inner loop: j = 1

Inner loop: j = 2

Inner loop: j = 3

** Outer loop: i = 2

Inner loop: j = 0

Inner loop: j = 1

Inner loop: j = 2

7.10 Union

A union can have several members of different types including user-defined types. But these different members are actually sharing the same block of memory space. At one time a union can only hold one member.

At different times during a program’s execution, not all objects are relevant in the meantime. So a union combines a group of objects which are never used at one time to save space. The number of bytes used to store a union is at least the size of the largest member.

A union is declared in the same way as a struct or a class:

union Number {

int x;

float y;

};

Number n1;

n1.x = 33;

The following operations can be performed on an union:

1. Assignment (between same type of unions);

2. Take reference (&);

3. Access union members through “.” and “->”;

Unions can not be compared with each other for the same reason as structures.

An union can also have constructor, destructor or other methods, just like a class. But there is no inheritance or virtual issues for union.

An anonymous union is a union without a type name. Such a union can not be used to define objects later, but itself defines an object, whose members can be directly accessed in its scope just like normal variables, without using “.” or “->” operators:

union {

int x;

float y;

};

x = 33;

7.11 Linkage Specification extern "C"

This is to tell the compiler that the following function is not defined in this program, and it is the compiler's job to look for the definition. This is also to inform the compiler that the following function was compiled as C code. C++ specially encodes method names for type-safe linkage, but C doesn’t. So when an attempt is made to link C code with C++ code, the method compiled in C will not be directly recognized.

For a single method:

extern “C” method prototype

For multiple methods:

extern “C”

{ method prototypes }

7.12 __stdcall

This is to indicate the calling convention which is used to call Win32 API functions. The callee cleans the stack, so the compiler makes vararg functions __cdecl. Functions that use this calling convention require a function prototype. The following list shows the implementation of this calling convention:

Argument-passing order: Right to left.

Argument-passing convention: By value, unless a pointer or reference type is passed.

Stack-maintenance responsibility: Called function pops its own arguments from the stack.

Name-decoration convention: An underscore (_) is prefixed to the name. The name is followed by the at sign (@) followed by the number of bytes (in decimal) in the argument list. Therefore, the function declared as int func( int a, double b ) is decorated as follows: _func@12

7.13 typedef

typedef is used to assign an alias to an existing data type. Then this synonym can be used in place of the original data type:

typedef Card * PCARD;

...

PCARD pCard;

typedef can make a program easier to understand and modify, more portable. It can also be used to represent a very long type. In C programming, especially, if you define a struct in the following way

struct Employee {

LPSTR name;

LPSTR dept;

short age;

};

When you create an instance of the struct, you have to say

struct Employee e1;

While if you use typedef to define the structure:

typedef struct {

LPSTR name;

LPSTR dept;

short age;

} Employee;

when you create an instance you can say

Employee e1;

8. Class string & String Stream Processing

8.1 Basics of string

string s1("Hello!"); // create a string "Hello!"

string s1 = "Hello!";

string s1(5, 'X'); // create a string "XXXXX"

s1 = 'a';

Constructing a string that is too long will throw a length_error exception.

Unlike C-style string “char *”, string is not required to end with NULL. NULL is regarded as a normal character in string.

Stream insertion and extraction operator (<< and >>) and method getline have been overloaded for string.

8.2 String Assignment

Assignment such as “s1 = s2” is not allowed for char * but allowed for string, because string is a class and operator = has been properly overloaded. Or you can explicitly call string’s assign method

s1.assign(s2);

s1.assign(s2, start, number);

The second call assigns “number” of characters of s2 to s1, starting from subscript "start":

A character in a string can be accessed by

s1[2] = 'a'; // or

s1.at[2] = 'a';

[ ] operator doesn't provide range check, while at method does.

8.3 Creating Strings

There are two ways to create a string:

string name(“Frank Liu”);

string name = “Frank Liu”;

The first one is more efficient than the second, becasue the first one only calls class string’s constructor, while the second one calls string’s default constructor first, then its assignment operator.

8.4 Appending Strings

string s1 = “Frank”;

string s2 = s1 + “ Liu”;

s2 will be “Frank Liu”. Or you can explicitly call string’s append method:

s1.append(" Liu"); // or

s1.append(s2, start, end);

The second call appends part of s2 to s1, from subscript “start” to “end”.

8.5 Comparing Strings

The following operation can be done:

> & >=

< & <=

Or you can explicitly call string’s compare method

int result = s1.compare(s2);

which returns the result as positive if s1 > s2, zero if s1 == s2, negative if s2 < s2.

result = s1.compare(start1, length, s2, start2, length); // or

result = s1.compare(start1, length, s2);

compare part of s2 with part of s1. s1 starts from "start1", s2 starts from "start2", number of characters is "length".

8.6 Sub-string

cout << s1.substr(start, length);

method substr returns a string made from s1, starting from "start" and number of characters is "length".

8.7 Swapping string

s1.swap(s2);

swaps the value of s1 with s2.

8.8 Characteristics of string

Class string has the following methods to return the characteristics of a string:

1. capacity returns the number of characters that can be stored in s1 without increasing the memory capacity.

2. max_size returns the size of the largest possible string that can be stored in a string object.

3. size and length returns the number of characters currently stored in s1.

4. empty returns true if s1 is empty.

8.9 Finding Character in string

string::npos is a public static constant defined in class string, representing the maximum string length.

int result = s1.find("ABC");

looks for string "ABC" in string s1, and returns the subscript of the location, or string::npos.

result = s1.rfind("ABC");

searches string s1 backwards.

result = s1.find_first_of("ABC");

searches for the first occurrence in s1 of any character contained in "ABC". Searching starts from the beginning of s1.

result = s1.find_last_of("ABC");

searches for the last occurrence in s1 of any character contained in "ABC". Searching starts from the end of s1.

result = s1.find_first_not_of("ABC");

searches for the first character in s1 which is not contained in "ABC".

result = s1.find_last_not_of("ABC");

searches for the last character in s1 which is not contained in "ABC".

8.10 Replacing Characters in a string

s1.erase(23);

erases characters form 23 to the end.

s1.replace(start, number, "ABC");

replaces part of s1 with string "ABC", starting from "start", number of characters is "number".

s1.replace(start1, number1, s2, start2, number2);

replaces part of s1 with part of s2.

8.11 Inserting Characters into a string

s1.insert(before, s2);

inserts s1 into s1 before "before".

s1.insert(before, s2, start, number);

insert part of s2 into s1 before "before".

8.12 Conversion to C-Style char *

const char * ptr = s1.data();

returns the content of s1 to ptr, without terminating NULL character. So you must add NULL to the end of string ptr.

s1.copy(ptr, length, 0);

copy s1 to ptr, without terminating NULL character.

s1.c_str() returns a NULL-terminated const char *.

Whenever possible, always use string instead of C-style char *.

Converting a string containing NULL characters to C-style char * may cause logic errors.

8.13 string Iterators

string::const_iterator i1 = s1.begin();

cout << *i;

8.14 String Stream Processing

In addition to standard stream IO and file stream IO, stream IO includes capabilities for inputting from string and output to string in memory. These capabilities are often referred to as in-memory IO or string stream processing.

Input from string is supported by class istringstream, while output by class ostringstream. To use them you must include <sstream> and <iostream>.

One application of these techniques is data validation. You first input from standard input stream to a string, then do some modification or validation on it, then input from that string.

string s1 = "Hello! \n";

string s2 = "Good Morning! \n";

string s3 = "How are you? \n";

ostringstream output;

output << s1 << s2 << s3 << endl;

istringstream input("How are you?");

input >> a >> b;

9. Standard Template Library (STL)

9.1 Introduction to Container

Containers are objects that contains objects. In STL, containers only encapsulates some primitive operations. The algorithms are independent of the containers. They manipulate containers using iterators.

¨ Containers Categories

w Sequence Containers: organize a collection of objects into a strictly linear arrangement. There are four types of sequence containers: normally arrays, vectors, lists and deques.

w Sorted Associative Containers: a collection which is kept sorted with keys. There are four kinds of associative containers: sets, multisets, maps and multimaps.

w Adapters: inherit from the two first-class containers and modify their interfaces. There are three major types: stack, queue, priority_queue

¨ Common methods of all containers

w default constructor, copy constructor, destructor;

w operator =, <, <=, >, >=, ==, !=;

w empty: return true if container has no element

w max_size: maximum number of elements for a container

w size: number of elements currently in the container

w swap: swaps the elements

¨ Common methods of all containers

w begin: returns an iterator or const_iterator to the first element of the container

w end: returns an iterator or const_iterator to the next position after the end of the container

w rbegin: returns an reverse_iterator or const_reverse_iterator to the last element of the container.

w rend: returns an reverse_iterator or const_reverse_iterator to the position before the first element of the container

w erase: erases one or more elements from the container

w clear: erases all elements from the container

¨ Type of Container Element

Container element can be of any type, both built-in types or user-defined types. However, user-defined types which is to be stored in containers must support a minimum set of methods. When an element is inserted into a container, a copy of the element is made. If default memberwise copy can not do the job, then the type must provide its own copy constructor and assignment operator. Also, associative containers and many algorithms require elements to be compared. So the element type should provide overloaded operator = = and <.

9.2 Iterator

Iterators are sophisticated pointers pointing to container elements. Except for vectors which can manipulate its elements through integer subscripts (such as v[1], v[2]), all container methods and STL algorithms manipulate container elements through iterators.

There are two kinds of container methods: methods which return iterators and methods which modifies the elements.

1. Iterator-returning methods return iterators pointing to some special elements such as the first or the last element, or an element which conforms to a certain rule (e.g., the one pointing to an element which is equal to the passed object). With these returned iterators, container's element modifying methods and STL algorithms can then modify the elements through them.

2. Element-modifying methods have quite limited functionality – much of the job is done through independent STL algorithms, which are more abstracted from containers.

Iterators are designed to act as a medium to separate "how" from "what": new STL algorithms and client applications are supposed to be developed with only iterators, without knowing the type of the container and the type of its elements. Java achieved this goal. All containers hold only one type: class Object, and all containers use one type of iterator.

However, C++ did not achieve such a complete abstraction. When you create an iterator, you have to specify the type of container it belongs to – a vector, a map, a multiset, etc., and the type of the container element - integer, string, or other user-defined types.

We can not create an iterator pointing to a certain element in a container. We can only ask a container to return an interator pointing to a certain element.

Normal arrays are random access containers, pointer to arrays i.e. array names are random access iterators.

9.3 Iterator Types

1. iterator: refer to objects which can be modified

2. const_iterator: refer to objects which can not be modified

3. reverse_iterator

4. const_reverse_iterator

To create an iterator, Three kinds of information need to be provided: container type, element type and iterator type. It is not abstracted enough.

vector<int>::reverse_iterator p1;

9.4 Iterator Capabilities

1. Input iterator: used to read an element from a container. It can only move in the forward direction one element at a time, and only support one-pass algorithms.

2. Output iterator: used to write an element to a container. It can only move in the forward direction one element at a time, and only support one-pass algorithms.

3. Forward iterator: combines the capabilities of the input and output iterator, and retains their position in the container as state information.

4. bidirectional iterator: add the ability to move in both backward and forward directions.

5. Random access iterator: add the ability to directly access any element – such as a pointer to an array.

The capability of an iterator is is decided by the container to which it belongs. Different kinds of containers provide iterators of different capabilities.

9.5 Iterator Capabilities Supported by Different Containers

1. vector: random access

2. list: bidirectional

3. deque: random access

4. set: bidirectional

5. multiset: bidirectional

6. map: bidirectional

7. multimap: bidirectional

8. stack: no iterator supported

9. queue: no iterator supported

10. priority_queue: no iterator supported

Different types of STL algorithms requires iterators of different capabilities.

9.6 IO Iterators

There are other two types of iterators which do not belong to any container: istream_iterator and ostream_iterator. They are used to input and output a certain type of object in a type-safe manner from/to an IO object such as cin and cout.

9.7 Operations Supported by Different Iterators

¨ All iterators:

++p

p++

¨ Input iterators:

p1 = p2

p1 == p2

p1 != p2

¨ Output iterators:

p1 = p2

¨ Forward iterators:

provide all the methodality of both input and output iterators.

¨ Bidirectional iterators:

-- p

p —-

¨ Random access iterators

p + i

p – i

p += i

p -= i

p[i]

p1 < p2

p1 <= p2

p1 > p2

p1 >= p2

9.8 Sequence Containers

Sequence containers include vector, list and deque. vector stores elements contiguously in memory, list is linked with double pointers, and deque combines the advantages of vector and deque.

They have some common methods:

1. front: return an iterator to the first element in the container

2. back: return an iterator to the last element in the container

3. push_back: insert a new element at the end of the container

4. pop_back: remove the last element of the container

5. insert: insert one or a range of elements into the container – before the indicated location.

9.9 vector

vector is the most commonly used container in STL.

Class vector provides a data structure with contiguous memory locations. This enables efficient, direct access to any element via subscript operator [ ] like arrays. So a vector is a more intelligent and complex array.

A vector returns random access iterators.

Because only random access iterators support “<” operator. So it is always safer to use “!=” operator, which is supported by all iterators except for output iterators.

Header file of vectors is <vector>.

¨ Element insertion and deletion

Because vector elements occupies contiguous memory, it is OK to insert or delete new element at the back, but expensive at the front or middle, for the entire portion of the vector after the inserted or deleted element have to be moved to keep it contiguous.

Suppose you have a vector of 9 elements, now you delete the 5th. The vector will call element’s assignment operator to assign the 6th element to the 5th, the 7th to 6th, ... , finally the 9th to the 8th. Then it will delete the last element. So you can see, to keep a contiguous memory a great deal of work needs to be done if you delete or add from the middle. In contrast, for a linked list, all you need to do is to point the pointer in the 4th element to the 6th. Therefore, if you need to frequently insert and delete from the middle, use a linked list.

When an element is inserted, the compiler will first call copy constructor to create a new element at the end, and use assignment operator to assign the second last to the last, and so on, and finally assign the object which is to be inserted to the original object which is at the insertion point.

vector's elements can be accessed with subscription just like arrays. Operator [ ] does not perform range check, but method at does.

If a new element is added to a full vector, the vector increases its size automatically – some would double its size, so would increase a certain amount.

¨ Two more methods of its own

vector has two more methods of its own:

1. capacity: vector’s capacity is not always its number of elements. When a full vector receives a new element, it may double its size. So if a vector has 4 elements and one is added, it will have a size of 5 and capacity of 8.

2. resize: if you think the doubled capacity consumes too much memory, you can use it to resize the vector.

¨ Sample codes and explanations:

const int SIZE = 6;

int a[SIZE] = {1,2,3,4,5,6};

vector<int> v1; // create an empty vector

vector<int> v2(a, a + size); // create a vector using part of an array

v1.push_back(11); // insert an element at the back of the vector

v1.push_back(22);

vector<int>::iterator p1; // create an iterator

// traverse a vector with iterator:

for(p1 = v1.begin(); p1 != v1.end(); p1++)

cout << *p1;

vector<int>::reverse_iterator p2; // create a reverse_iterator

// traverse a vector with reverse_iterator

for(p2 = v2.rbegin(); p2 != v2.rend(); p2++)

cout << *p2;

cout << v2.size(); // size of vector

cout << v2.capacity(); // capacity of vector

v1[0] = 7; // set first element to 7 – no range checking

v1.at(2) = 7; // set third element to 7 – with range checking

If the argument at receives is out of range, it will throw a “out_of_range” exception, which is in <stdexcept>.

v1.insert(v1.begin()+3, 22); // insert 22 as 4th element

v2.insert(v1.begin()+3, v2.begin(),v2.begin()+5);

This is to insert the first 6 elements of vector v2 into the vector v1, to start as from the 4th element of vector.

istream_iterator<int> input(cin);

int a, b;

a = *input;

++ input;

b = *input;

This is to declare an istream_iterator to input integers from cin.

ostream_iterator<int> output(cout, “ “);

*output = a + b;

This is to declare an ostream_iterator to output integers to cout. Each outputted value is to be separated by a space character specified as the second argument.

v1.erase(v1.begin()+3); // remove the 4th element from vector

v1.erase(v1.begin(), v1.end()); // remove a range of elements from vector

v1.clear(); // remove all elements from vector

9.10 Binary Predicate Function

A function supplied as an argument to other functions such as container methods and STL algorithms, which takes two arguments, performs a comparison, and returns a bool value indicating the result. The algorithms only call the passed function to perform the comparison, but how to implement the comparison is customized with this function. This technique is also called “call back”, which is an effort to separate “what to do” from “how to do”.

template < class T >

bool myCompare(T a, T b)

{ return a < b; }

9.11 List

Class list is implemented as a doubly-linked list, so it can not be randomly accessed, and it only supports bidirectional iterators. As said before, because the list elements are not stored contiguous and only connected one by one through doubly links, it is convenient to insert or delete an element at any location of the list.

Header file of lists is <list>.

¨ Methods

1. splice: remove elements from a list and insert into another

2. push_front: insert an element at the front

3. pop_front: remove an element at the front

4. remove

5. unique

6. merge

7. reverse

8. sort

¨ Sample program

list<int> l1, l2; // create an empty list

l1.sort();

Method “sort” sorts the elements in the list in ascending order by calling element's operator "<". If the element type does not provide a "<" operator, or you want to compare two elements in a different way, or the list elements are pointers to other objects, you will need to provide a comparing method yourself. The second version of “sort” allows the programmer to supply a binary predicate function.

l1.splice(l1.end(), l2);

This is to remove all the elements of l2 and insert them into l1 before the iterator position, which is the end.

l1.splice(l1.end(), l2, l2.begin() + 1);

This is to remove the l2’s second element, and insert it into l1 as the last element.

l1.splice(l1.begin(), l2, l2.begin(), l2.begin() + 2);

This is to remove l2’s first three elements and insert them into l1 at the beginning.

l1.merge(l2);

This is to remove all elements from l2 and insert them into l2, and the result is in sorted order. Before this operation, the two lists must already be in sorted order. The second version of “merge” takes another argument – a binary predicate method to determine the actually sorting order.

l1.unique();

This is to remove duplicated elements in the list. Before this operation the list must already be in sorted order. A second version of “unique” method takes an argument as a predicate method, to determine whether two elements are equal.

l1.swap(l2); // exchange the contents of l1 with l2

l1.assign(l2.begin(), l2.end());

This is to replace the content of l1 with content of l2 in the specified range.

v1.remove(44); // remove all elements with value 44

You can see that list has much more methods than vector. It is because vector can access its elements randomly via subscripts, most of these functionality can be done very easily by clients. There is no need to provide such methods.

9.12 Deque

Class deque is designed to combine the advantages of vector and list together. Like a vector, a deque can be randomly accessed via subscripts, and like a list, elements can be conveniently inserted and deleted at both ends of the deque. Because of this combination, a deque iterator must be more intelligent than a vector iterator. Insertion and deletion at the middle of a deque is optimized to minimize the number of elements copied.

Header file of deque is <deque>.

deque has two more methods of its own:

1. push_front

2. pop_front

9.13 Comparison of vector, list and deque

Vector has the best random access performance. So a vector is always the first choice if delete and insert only happens at the back of the collection. If insertion and deletion frequently happens at both ends, a deque is preferred than a list because it is more efficient. If frequent insertion and deletion also happens in the middle of the collection, then we should use a list.

9.14 Associative Containers

All associative containers store and retrieve elements essentially in pairs: one is the key, one is the value. Multisets and sets use their values as keys, while multimaps and maps use a separate key for each of the value.

In an associative container, the way to quickly search for an element by its key is to put all the keys and the addresses of the elements or records in a look-up table arranged with searching algorithms such as binary tree, b tree or b+ tree. The size of the look-up table should be minimal to speed up the searching process, so the size of the key itself must be minimal. Because sets uses elements themselves as keys, its elements must be of minimal size. If the element size is too big to be a key, then you should use a separate key to represent the value, that is to say, you have to use maps. The separate key may very probably be a field of the element, such as the employee number of class Employee.

Therefore, the key difference between sequential containers and associative containers is: sequential container elements are stored in by themselves, while associative container elements are stored with a look-up table.

Regardless of the sequence in which the elements are inserted, they are always in sorted order.

¨ Common Methods

1. find

2. lower_bound

3. upper_bound

4. count

Because associative containers can only be accessed through keys, all their methods are key-related.

9.15 Class pair

Class pair has two public data members of any two types:

template<class T1, class T2>

class pair {

public:

T1 first;

T2 second;

pair() {}

pair(const T1 & x, const T2 & y)

: first(x), second(y) {}

};

It is used to store a pair of values so that a method can return a pair of values. In <bool>.

9.16 multiset

The elements themselves are used as keys. The ordering of the elements is decided by a comparator method object such as “less<Type>”.

The type of the key must support appropriate comparison, e.g., keys sorted with “less<Type>” must support operator<.

A multiset container supports bidirectional iterators.

Header file of multiset is <set>.

¨ Sample Program

typedef multiset<int, less<int> > ims1; // declare an alias of multiset type

ims1 m1; // create an empty multiset

cout << m1.count(15); // count the number of keys with value of 15

m1.insert(15);

This is to insert 15 into the multiset. The second version of insert takes an iterator and a value as arguments and begins the search for the insertion point from the iterator position specified.

m1.insert(a, a + SIZE);

The above third version takes two iterators as arguments to specify a range of values from another container to add into the multiset:

ims1 const_iterator result;

result = m1.find(15);

Method “find” returns an iterator or a const_iterator pointing to the first element with value 15. If not found, it will return an iterator to the position after the last element.

cout << *( m1.upper_bound(15) ) << *( m1.lower_bound(15) );

Method “upper_bound” returns an iterator or const_iterator to the first element with value 15, while method “lower_bound” returns an iterator or const_iterator to the location after the last element with value 15. If not found, they both return an iterator to the position after the last element.

pair< ims1::const_iterator, ims1::const_iterator > p1;

p = m1.equal_range(22);

cout << *(p1.first) << *(p1.second);

Method “equal_range” returns a pair object containing the lower_bound and upper_bound of 22. Then we can access the two elements through the two iterators stored in the pair.

9.17 set

The methods of set are identical to multiset, except that a set must not have duplicated keys.

typedef set< double, less<double> > double_set;//define an alias of a set type

double_set s1(a, a + size); // create a set out of part of an array

pair< double_set1::const_iterator, bool > p;

p = s1.insert(13.8);

Method "insert" will first search for the element of 13.8. If element found, it will return an iterator to the found element and a false value to indicate that the value was not inserted. If the element is not found, it will insert that value in the set, return an iterator to the inserted element and a true value to indicate the successful insertion.

9.18 multimap

Elements of maps are sorted and organized very similar to sets. Many of their methods are the same. The only difference is that in a map a separate key is used to represent a value. Both the key and the value can be of any type. Just as the sets, the ordering of the keys is determined by a predicate method such as less<Type>. The maps support bidirectional iterators.

Multimaps allows duplicated keys, so you can have duplicated values with the same key. This is called "one-to-many" relationship.

typedef multimap< int, double, less<int> > mmid1;

mmid1 m1;

m1.insert( mmid1::value_type( 15, 2.73 ) );

"value_type" is one of those pre-defined types just like iterators, const_iterators. It represents the type used in the container.

for(mmid1::const_iterator i = m1.begin(); i != m1.end(); i++)

cout << i->first << i->second;

This is to traverse the multiset with a const_iterator and print out the key and the value of each element.

Header file of both multimap and map is <map>.

9.19 map

Duplicated keys are not allowed in a map, so only a single value can be associated with each key. This is called a "one-to-one mapping".

Because of this "one-to-one mapping", you can specify the key and get back the associated value quickly. A map is also called an "associative array", for you can provide the key in subscript operator [] and locate the element. Insertion and deletion can be done anywhere efficiently.

typedef map< int, double, less<int> > mid;

mid m1;

m1.insert( mid::value_type(15, 2.73) );

m1[13] = 8.93;

When key 13 is in the map, operator[ ] returns a reference to the element, so that it can be assigned 8.93. If key 13 is not in the map, operator[ ] inserts the key and returns a reference to the value.

9.20 Container Adapters

Container adapters are implemented upon first-class containers, just like shrinking inheritance. Some extra implementations can also be added to achieve more specific task, such as the sorting of priority_queue. They don't support iterators. There are three types of adapters: stack, queue and priority_queue. Their common methods are push and pop.

9.21 Stack

A stack enables insertion and deletion at the same end of the underlying data structure, commonly referred to as a last-in-first-out data structure. A stack can be implemented with any of the sequence containers: vector, list and deque. By default it is deque. For best performance, use vector or deque as the underlying container.

All stack methods are implemented as inline to avoid an extra method call.

Header file of stack is <stack>.

¨ Methods

1. push: insert an element by calling underlying container method push_back

2. pop: remove an element by calling pop_back

3. top: return a reference to the top element by calling back

4. empty: determine if the stack is empty by calling empty

5. size: return the size of the stack by calling stack

¨ Sample Program

stack<int> s1; // using a deque<int> as underlying container

stack<int, vector<int>> s2; // using a vector<int> as underlying container

stack<int, list<int>> s3; // using a list<int> as underlying container

9.22 queue

A queue enables insertion at one end of the underlying container, and deletion from the other end. This is referred to as FIFO data structure. A queue can be implemented with deque and list. By default it is deque. It can perform better than list.

All queue methods are implemented as inline to avoid an extra method call.

Header file of queue is <queue>.

¨ Methods

1. push: insert at the back by calling underlying container method push_back

2. pop: remove at the front by calling pop_front (that is the reason why vector can not be used)

3. front: return a reference to the first element by calling front

4. back: return a reference to the last element by calling back

5. empty

6. size

9.23 priority_queue

A priority_queue can be implemented with vector and deque. By default it is vector. A priority_queue is a sorted vector. When adding elements to priority_queue, the elements are automatically inserted in priority order, so that the highest priority element i.e. the largest element is always the first to be removed. This is accomplished by using a sorting technique called "heapsort", and such a data structure is called a "heap".

The ordering of the elements is performed with comparator method object less<Type>. Programmer can also supply their own comparator.

¨ Methods

1. push: insert an element at the proper location, by calling push_back then sorting the container using heapsort

2. pop: remove the highest-priority element by calling pop_back

3. top: return a reference to the top element by calling front

4. empty

5. size

9.24 Introduction to Algorithms

Only some basic functionality is implemented through container’s methods, while most of the implementations are provided by independent STL algorithms. Such separation makes it easier to add new algorithms without modifying containers.

Algorithms operate on containers through iterators, and many of them use a pair of iterators specifying a range of elements in a container. This pair of iterators is most commonly provided by container method begin and end.

Header file of algorithms is <algorithm> or <algo>.

¨ Common Non-mutating Algorithms

w adjacent-find

w count

w count-if

w equal

w find

w for_each

w mismatch

w search

w search_n

¨ Numerical algorithms from <numeric>

w accumulate

w inner_product

w partial_sum

w adjacent_difference

9.25 Fill Container with Objects

fill(c1.begin(), c1.end(), ‘A’);

set every element to be ‘A’. Takes at least forward iterators.

fill_n(c1.begin(), 5, ‘A’);

set the first 5 elements to be ‘A’. Takes at least output iterators.

generate(c1.begin(), c1.end(), myFill);

char myFill() // generator method

{

static char c = ‘A’;

return c++;

}

set all elements of the container with objects provided by a generator method. Takes at least forward iterators.

generate_n(c1.begin(), 5, generate);

set first 5 elements. Takes at least output iterator.

9.26 Compare Elements

bool result = equal( c1.begin(), c1.end(), c2.begin() );

compare elements of two containers using their operator = =. Takes at least input iterators.

typedef vector<int>::iterator vv1;

pair<vv, vv1> location;

location = mismatch( c1.begin(), c1.end(), c2.begin() );

compare elements of two containers using their operator = =, and return a pair of iterators pointing to the mismatching element in both containers. If all equal, the pair will be equal to the last iterator. Takes at least input iterators.

char c1[8] = “Hello”, c2[8] = “Good-bye”;

bool result = lexicographical_compare(c1, c1 + 8, c2, c2 + 8);

return true if first is greater than second.

if( includes(a1, a1 + SIZE1, a2, a2 + SIZE2) )

method “includes” compares the two sets of sorted values, and returns true if set is include in set 1. Takes at least input iterators.

9.27 Remove Elements

vector<int>::iterator i = remove( c1.begin(), c1.end(), 10);

eliminate all elements with value of 10. It doesn’t erase the element like erase method, it only move all untouched elements forward, leaving all eliminated elements at the back, with undefined values. It returns an iterator pointing to the last retained element. So the size of the container is not changed. Takes at least forward iterators.

i = remove_copy(c1.begin(), c1.end(), c2.begin(), 10);

copy all elements that DO NOT have the value of 10 from container c1 into container c2. It returns an iterator to the last copied element in c2. The first two arguments must be at least input iterators, while the third must be at least output iterators.

i = remove_if(c1.begin(), c1.end(), myJudge);

bool myJudge(int x)

{ return x > 9; }

eliminates all elements that the user-defined unary predicate method will return true. It does the same thing on eliminated elements like remove. Takes at least forward iterators.

i = remove_copy_if(c1.begin(), c1.end(), c2.begin(), judge);

combination of remove_if and remove_copy.

9.28 Replace Elements

replace(c1.begin(), c1.end(), 10, 100);

replace all elements with a value of 10 with 100. Takes at least forward iterators.

replace_copy(c1.begin(), c1.end(), c2.begin(), 10, 100);

copy all elements in c1 in the specified range into c2, replacing all values of 10 with 100. Returns an iterator to the last copied element in c2. The first two arguments must be at least input iterators, while the third must be at least output iterators.

replace_if(c1.begin(), c1.end(), judge, 100);

replace all elements that the user-defined unary predicate method returns true with value 100. Takes at least forward iterators.

replace_copy_if(c1.begin(), c1.end(), c2.begin(), judge, 100);

combination of replace_if and replace_copy.

9.29 Mathematical Algorithms

random_shuffle(c1.begin(), c1.end());

randomly shuffle the elements in the specified range. Takes random-access iterators.

int result = count(c1.begin(), c1.end(), 8);

count the number of elements with value of 8. Takes at least input iterators.

result = count_if(c1.begin(), c1.end(), myJudge);

count the number of elements that the user-defined unary predicate method will return true. Takes at least input iterators.

vector<int>::iterator i1, i2;

i1 = min_element(c1.begin(), c1.end());

i2 = max_element(c1.begin(), c1.end());

return an iterator to the smallest/largest element in the specified range. Takes at least input iterators.

result = accumulate(c1.begin(), c1.end(), 0);

sums up the elements, with initial value as the third argument. Operator += should be implemented. Takes at least input iterators.

result = accumulate(c1.begin(), c1.end(), myCalculate, 0);

int myCalculate(int accumulation, int element)

{ return accumulation += element * element; }

A second version of accumulate takes as its third argument a name of a user-defined method which determines how to accumulate.

for_each(c1.begin(), c1.end(), myMethod);

applies myMethod which takes one element as argument to each element. The method can not modify the elements. If you want to modify, use method transform. Takes at least input iterators.

transform(c1.begin(), c1.end(), c2.begin(), myMethod);

applies myMethod to each of the elements in c1, and place result in c2. c2 can be also c1. Takes at least input iterators.

9.30 Searching and Sorting Elements

vector<int>::iterator i;

i = find(c1.begin(), c1.end(), 16);

return an iterator to the element of value 16. Takes at least input iterators.

i = find_if(c1.begin(), c1.end(), myJudge);

return an iterator to the element that the user-defined unary predicate method returns true. Takes at least input iterators.

sort(c1.begin(), c1.end());

sort the elements in ascending order. A second version takes a third argument as a binary predicate method which returns a bool type to decide the sorting order.

bool result = binary_search(c1.begin(), c1.end(), 223);

use binary search to determine whether 223 is in the specified range.

9.31 Swap Elements

swap(a[0], a[1]);

swap the two arguments.

iter_swap(v1.begin(), v1.begin + 3);

exchange the first and fourth element.

swap_ranges(a, a+5, a+10);

exchange a range of elements from a to (but not including) a + 5, with a range of elements starting from a + 10. Taks three forward iterators.

9.32 Copy Elements

copy_backward(v1.begin(), v1.end(), v2.end());

copy a range of elements from one container to another, starting from the element before v1.end to v1.begin, returning an iterator at the last elements copied (v1.begin). Takes three bidirectional iterators.

The main difference between algorithm copy and copy_backward is: copy returns an iterator after the last copied element, while copy_backward returns an iterator at the last copied element.

unique_copy(v1.begin(), v1.end(), back_inserter(v2));

use back_inserter to insert all unique values into container v2. Takes at least output iterators.

reverse_copy(v1.begin(), v1.end(), back_inserter(v2));

make a reversed copy of v1 and insert into v2. First two iterators to be at least bidirectional, and third to be output.

i = set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), v3);

copy the elements in v1 which are not in v2 into v3. Both v1 and v2 must be in ascending order. It returns an output iterator at the last copied element in v3. The first four iterators must be at least input iterators, while the last output iterator.

i = set_intersection(v1.begin(), v1.end(), v2.begin(), v2.end(), v3);

copy the elements in v1 which are also in v2 into v3. Both v1 and v2 must be in ascending order. It returns an output iterator at the last copied element in v3. The first four iterators must be at least input iterators, while the last output iterator.

i = set_symmetric_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), v3);

copy elements in v1 that are not in v2 and elements in v2 that are not in v1 into v3. Both v1 and v2 must be in ascending order. It returns an output iterator at the last copied element in v3. The first four iterators must be at least input iterators, while the last output iterator.

i = set_union(v1.begin(), v1.end(), v2.begin(), v2.end(), v3);

copy elements that are either in one of v1 and v2 or in both of them into v3. Elements that are both in v1 and v2 are only copied from v1. Both v1 and v2 must be in ascending order. It returns an output iterator at the last copied element in v3. The first four iterators must be at least input iterators, while the last output iterator.

9.33 Merge Containers

merge(v1.begin(), v1.end(), v2.begin(), v2.end(), v3.begin());

combine two sorted ascending sequences of values into a third sorted ascending sequence. The first four iterators must be at least input iterators, while the last output iterator.

merge(v1.begin(), v1.end(), v2.begin(), v2.end(), back_inserter(v3));

merge(v1.begin(), v1.end(), v2.begin(), v2.end(), front_inserter(v3));

merge(v1.begin(), v1.end(), v2.begin(), v2.end(),inserter(v3, v3.begin() + 2));

template method back_inserter is in <iterators>. It calls the container’s method push_back to insert an element at the back of the container v3. There are two other inserters.

inplace_merge(v1.begin(), v1.begin + 5, v1.begin() + 10);

Reorders the sequences designated by iterators in the two ranges [0, 5) and [5, 10), each ordered by operator<, to form a merged sequence of length 10 beginning at first, also ordered by operator<. The merge occurs without altering the relative order of elements within either original sequence. Moreover, for any two elements from different original sequences that have equivalent ordering, the element from the ordered range [0, 5) precedes the other. Takes at least bidirectional iterators.

9.34 Unify Elements

i = unique(v1.begin(), v1.end());

after unique is applied to the range of elements, only a single copy of each value is left. It returns an iterators after the last legal element in the sequence – the rest of values are undefined. Takes at least forward iterators.

9.35 Reverse Elements

reverse(v1.begin(), v1.end());

reverse the elements. Takes at least bidirectional iterators.

9.36 Locate Element in Sorted Container

i = lower_bound(v1.begin(), v1.end(), 233);

return an iterator to locate the first right location in an ascending-order sequence, at which the value 233 can be inserted, while still keeping the new sequence in ascending order. Takes at least forward iterators.

i = upper_bound(v1.begin(), v1.end(), 233);

return an iterator to locate the last right location in an ascending-order sequence, at which the value 233 can be inserted, while still keeping the new sequence in ascending order. Takes at least forward iterators.

typedef vector<int>::iterator vv1;

pair<vv1, vv1> p1;

p1 = equal_range(v1.begin(), v1.end(), 233);

return a pair of iterators to locate both the first and the last right location in an ascending-order sequence, at which the value 233 can be inserted, while still keeping the new sequence in ascending order. Takes at least forward iterators.

These three algorithms are usually used to locate insertion point in sorted sequences.

9.37 Heapsort

Heapsort is a sorting method, in which an array of elements is arranged into a special binary tree called a heap. The key features of a heap are that the largest element is always at the top of the heap, and the values of the children of any node in the binary tree are always less than or equal to that node’s value. Such a heap is called a maxheap.

make_heap(v1.begin(), v1.end());

take the values and create a heap, so that it can be sorted. Only takes random-access iterators – therefore only can be used on arrays, vectors and deques.

sort_heap(v1.begin(), v1.end());

sort a sequence which had been arranged in a heap. Only takes random-access iterators.

push_heap(v1.begin(), v1.end());

add a new value into a heap. Each time push_heap is called, it assumes that the last element in the vector is the one newly added and all the rest elements have already been arranged as a heap. Only takes random-access iterators.

for ( i = 0; i < v1.size(); ++i)

pop_heap(v1.begin(), v1.end() – i);

swap the top element with the one before v1.end – i. Finally results in a sorted sequence. The method assumes that the range of values specified by the two arguments has already been a heap.

9.38 min & max

determine the minimum and maximum of two containers.

10. ANSI/ISO C++ Standard Language Additions

10.1 bool

In C++ zero represents false and non-zero represents true. But bool type true and false is clearer and is preferred.

10.2 Four Casts

The ANSI/ISO C++ draft standard introduces four new cast operators to replace the old-style cast, which do all kinds of casting jobs. The new casts are less powerful and more specific, and each of them has separate purposes, thus give the program more precise control. Old casting is still legal, but new casts are preferable.

¨ static_cast and dynamic_cast

static_cast operator and dynamic_cast is mainly used to cast up or dow the inheritance hierarchy – to cast base-class objects or pointers to derived-class objects or pointers, or vice versa. Consider the following example:

class Base {

public:

Method1() {};

};

class Derived : public Base {

public:

Method2() {};

};

void DoSomething(Base * pBase)

{

Derived * pDerived1 = static_cast<Derived *>(pBase);

pDerived1->Method2();

Derived * pDerived2 = dynamic_cast<Derived *>(pBase);

pDerived2->Method2();

}

static_cast only perform type checking at compile time. So it is safer than casting with "( )". But it does not perform run-time type checking, so it is the programmer's responsibility to make sure that pBase is pointing to a Derived object. If not, there will be a run time error.

In contrast, dynamic_cast performs RTTI, and if pBase is only pointing to Base object, dynamic_cast will find out and return 0. Therefore, you can decide the whether the cast is successful by checking this:

if(dynamic_cast<Derived *>(ptr) == 0)

¨ const_cast

const_cast casts away const or volatile. It only works on pointers, not objects. You can use it to modify a data member in a const method:

void Test::print() const // Test is a class

{

const_cast<Test *>(this)->member1++; // member1 is a data member of Test

cout << member1 << endl;

}

Such a capability can solve the problem for an intelligent pointer class:

#ifndef POINTER_H

#define POINTER_H

template <class Type>

class Pointer

{

public:

Pointer() : ptr(0), owner(false) {}

Pointer(Type * p) :ptr(p), owner(true) {}

Pointer(const Pointer & obj)

{

ptr = obj. ptr;

if(obj. owner == true)

{

owner = true;

const_cast< Pointer * >(&obj)->owner = false; }

else

owner = false;

}

const Pointer & operator=(const Pointer & obj)

{

ptr = obj. ptr;

if(obj. owner == true)

{

owner = true;

const_cast< Pointer * >(&obj)->owner = false;

}

else

owner = false;

return *this;

}

~ Pointer()

{ if(owner == true) delete ptr; }

int operator==(const Pointer &other) const { return ptr == other. ptr; }

Type * pointer() const { return ptr; }

Type * operator->() const { return ptr; }

Type & operator*() const { return *ptr; }

private:

Type * ptr;

bool owner;

};

#endif

Such a pointer object is fully intelligent: it allows shallow copy – multiple pointer objects pointing to the same server object – to avoid the copy of pointed objects which is most probably unnecessary and time consuming. It knows to delete the server object to which it is pointing, and meanwhile avoiding multiple deleting on the same object.

The key point of this class is an extra data member “owner”, representing whether a pointer object is the owner of its server object. There may be multiple pointer objects pointing to the same server object, but among them there is only one owner, and only the owner will delete the server object.

To accomplish this, in its copy constructor and assignment operator, it transfers the ownership from “right” object to “this” object, setting the right object’s owner to be false. Considering that the right object in these two methods must be constant (otherwise it can not be applied on constant objects), we need const_cast to cast away the const of the right object before we modify its data member “owner”.

¨ reinterpret_cast

reinterpret_cast is used for cases that yeilds implementation-dependent results, such as casing between function pointer types.

Because it allows you to do nonstandard strange conversions, dangerous manipulations can be done, and the result may be machine-dependent.

10.3 Namespace

Each namespace defines a scope where global identifiers and variables are placed:

namespace Test1 {

const double PI = 3.14159;

const double E = 2.71828

void print();

namespace Inner {

enum Day {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday};

}

void Test1::print()

{ cout << PI << Inner::Monday; }

Notice that unlike classes, there is no semicolon after the definition.

Namespace can be nested.

The definition of a namespace must either occupy the scope or be nested within other namespace.

Because method “print” is a member of namespace Test1, it can directly access other data members of Test1, and access nested namespace Inner through its name.

To use a namespace member, the member’s name must be qualified with the namespace name and binary scope resolution operator:

double area = Test1::PI * 2.0 * Radius;

if(day == Test1::Inner::Tuesday)

In case a namespace’s members are frequently used in a file, to simplify the invoking process, place statement

using namespace Test1;

then in the following part of the file the namespace members do not need to be proceeded with “Test::”. It can also be used to directly access only one member of the namespace:

using namespace Test1::PI;

using namespace std;

informs the compiler that namespace std is being used. The contents of header file <iostream> are all defined as part of namespace std.

Unnamed namespace members occupy the global namespace, are directly accessible and do not have to be qualified with a namespace name. Global variables are actually part of global namespace.

Ideally, in large programs, every entity should be declared in a class, method, block, or namespace. This helps clarify every entity’s role.

Classes do provide us with named scopes, but namespaces allow us to manage things more precisely. Before namespaces were introduced we had to create classes just for the purpose of defining a named scope, and we had to use typedef to get some of the control namespaces give us.

10.4 Run-Time Type Information (RTTI)

Because of polymorphism, when you write your program, you may not be able to know the exact type you are dealing with. But there are cases in which you do need to find out the exact type of the polymorphic object and deal with it differently according to its type. Run-time type information (RTTI) is used to find out the type of a polymorphic object.

Header file <typeinfo> defines operator typeid. It returns a const reference of class type_info, which is a class description of the operand. Class type_info has a method name, which returns a char * which is the type name of the operand.

class Base { ... }

class Derived : public Base { ... }

int main()

{

Base1 * pBase1 = new Derived;

const type_info & t1 = typeid(pBase1);

const type_info & t2 = typeid(*pBase1);

cout << "typeid(pBase1) = " << t1.name() << endl;

cout << "typeid(*pBase1) = " << t2.name() << endl;

}

If class Base and Derived are polymorphic types i.e. they have virtual functions, the output will be:

typeid(pBase1) = class Base1 *

typeid(*pBase1) = class Derived

If class Base and Derived do not have virtual functions, the output will be:

typeid(pBase1) = class Base1 *

typeid(*pBase1) = class Base

10.5 Operator Keywords

The ANSI/ISO standard provides operator keywords that can be used in place of some operators, in case that the keyboard you are using does not have these keys. For example, “and” can be used for “&&”, “bitand” for “&”, etc.

10.6 explicit Constructor

When a method is expecting an object which has a one-argument constructor but you pass the argument, compiler will implicitly call that one-argument constructor and convert that passed argument to the object:

class Base {

public:

Base(int a) : member(a)

{ cout << "Base constructor called with " << a << endl; }

int member;

};

void test(Base obj1)

{ cout << "Base object's member = " << obj1.member; }

int main()

{ test(333); }

The output will be:

Base constructor called with 333

Base object's member = 333

In same cases such an automatic implicit conversion may cause trouble. You can use keyword explicit in front of the constructor to suppress the implicit conversion. If you want to convert, you have to use explicit conversion such as static_cast, without which compiler will reject the conversion and prompt error message.

class Base {

public:

explicit Base(int a) : member(a)

{ cout << "Base constructor called with " << a << endl; }

int member;

};

void test(Base obj1)

{ cout << "Base object's member = " << obj1.member; }

int main()

{

test( static_cast<Base>(333) );

}

10.7 mutable Class Member

C++ provides mutable class member as an alternative to const_cast. A mutable data member is always modifiable even in a const method of a const object. The difference between const_cast and mutable is: for a non-mutable data member in a const method, every time you modify it, you have to use const_cast. This actually reduces the chance of accidental modification.

11. Some Individual Topics

11.1 DOS Keyboard Characteristics

When any key on the keyboard is pressed, a parameter in the operating system is set, and this parameter can be checked through method kbhit( ) in <conio>. Then we can get this character through method getch in <conio>.

Method getch returns one char at a time. If the key can be represented in ASCII, the ASCII code of the character is returned. But many keys cannot be represented in ASCII, for example the F1 key and the PageDown key. When they have been pressed, first a zero is returned, then the next value we get will tell us which key it was.

11.2 Detach Method for Wrapper Classes

Some wrapper classes has a pointer to another dynamic object. In its destructor it usually delete that object to free the space. However, if some one calls its “get” method which returns this pointer to the outside world, then the dynamic object is taken care of by another program and should be “detached” from the wrapper class, so that the destructor is no longer responsible to delete that dynamic object.:

class Person {

private:

char * m_strName;

public:

Person(char * name = NULL) : m_strName(name)

{}

~Person()

{

if(m_strName != 0)

delete m_strName;

}

char * Detach()

{

char * temp;

temp = m_strName;

m_strName = 0;

return temp;

}