Initialization lists and base class members

Just some thoughts upon a question that I ran across, raised about, why you could not initialize base class members in derived classes. Put it another way, why could you not use the base class data members in derived class' initialization lists. Simplest reason - the standards, the language rules don't allow it. But let's have some fun with the "what-if"s.

One reason that struck me almost as a first thought, that the data member could be private in which case, you won't have access to it in the derived class. But what if it is public or protected? Let us take an example:


    class Base
    {
    public:
        Base(){}
        Base(int member_) : member (member_){}
        int member;
        //other members
    };

    class Derived : public Base
    {
    public:
        Derived(int member_) : member(member_){}
        //other members
    };

    int main()
    {
        Derived derivedObject(10);
    }


You would have expected it to work since the base member is accessible in the derived class, it being public. After all, it is perfectly okay if you used it in the derived class constructor body! What is so special about initialization lists that only direct base classes, virtual bases and the containing class' data members can only appear?

You might think that the base class object might not have been allocated or does not exist at all while in the initialization list of the derived class and hence it is not allowed to do that. But that reasoning would be flawed. Why? For the following reason (quoting section 12.6.2 (5) from the standards):

Initialization shall proceed in the following order:

    — First, and only for the constructor of the most derived class as
described below, virtual base classes shall be initialized in the
order they appear on a depth-first left-to-right traversal of the
directed acyclic graph of base classes, where “left-to-right” is
the order of appearance of the base class names in the derived
class base-specifierlist.

    — Then, direct base classes shall be initialized in declaration order
as they appear in the base-specifier-list (regardless of the order
of the mem-initializers).

    — Then, non-static data members shall be initialized in the order
they were declared in the class definition (again regardless of
the order of the mem-initializers).

    — Finally, the body of the constructor is executed.

    [ Note: the declaration order is mandated to ensure that base and
member subobjects are destroyed in the reverse order of initialization.
    —end note ]

So, the base is already initialized before anything else happens in the initialization list. What could be the reason then? The reason probably is that it does not make sense! Once the base constructor has initialized the base member, the derived class initializing it does not make sense. How can one thing be initialized twice? The second time, it has to be an assignment.

But again, that holds true just for non-POD members. For the object being constructed in the above code, via default constructor of the base class, the POD member "member" remains uninitialized! It will have an indeterminate value. So, it won't be double initialization, would it? It would be initialized just once and that from the derived class constructor (initialization list).

Now, consider there were a further derived class that publicly derived from the above "Derived" class having the same member initialization syntax. Now, that makes it two. This could probably have been dealt with some complication set of rules but why add that logical overhead?

That is not all though. The *rules* can get more complex. The consideration of different access specifiers (private/protected), different inheritance types (private/protected), an explicit constuctor call that too initializes the member, virtual bases, different treatment for non-POD and POD types and what not. Things just start to get too complex and dirty if you allow that.

Simply speaking, base class members should be the base class' responsibility and derived class should only be concerned with the construction abstraction provided by the base classes in form of the base constructors and their initialization lists. Making things more coupled is always a sign of bad design choice where things just start to fall apart as soon as something changes. That is not good code.

To make things clear and simpler, it's best said and accepted that the standard does not allow it for initialization lists to take up the responsibility of initializing base class members, just the immediate bases, virtual bases and class' members.

Have fun... Cheers!

Distributed UUID Generation

How to generate a unique ID within a distributed environment in consideration of scalability ?

Also, the following feature requirements are given ...
  • The ID must be guaranteed globally unique (this rule out probabilistic approaches)
  • The assigned ID will be used by client for an unbounded time period (this means that once assigned, the ID is gone forever and cannot be reused for subsequent assignment)
  • The length of the ID is small (say 64-bit) compared to machine unique identifier (this rule out the possibility of using a combination of MAC address + Process id + Thread id + Timestamp to create a unique ID)
Architectural goals ...
  • The solution needs to be scalable with respects to growth of request rates
  • The solution needs to be resilient to component failure

General Solution

Using a general distributed computing scheme, there will be a pool of ID generators (called "workers") residing in many inter-connected, cheap, commodity machines.

Depends on the application, the client (which request for a unique id) may collocate in the same machine of the worker, or the client may reside in a separate machine. In the latter case, a load balancer will be sitting between the client and the worker to make sure the client workload is evenly distributed.

Central Bookkeeper

One straightforward (maybe naive) approach is to have the worker (ie: the ID generator) make a request to a centralized book-keeping server who maintains a counter. Obviously this central book-keeper can be a performance bottleneck as well as a single point of failure.

The performance bottleneck can be mitigated by having the worker making a request to the centralized book-keeper for a "number range" rather than the "id" itself. In this case, id assignment will be done locally by the worker within this number range, only when the whole range is exhausted will the book-keeper be contacted again.

When the book-keeper receive a request of a number range, it will persist to disk the allocated number range before returning to the client. So if the book-keeper crashes, it knows where to start allocating the number range after reboot. To prevent the disk itself being a SPOF, mirrored disk should be used.

The SPOF problem of the book-keeper can be addressed by having a pair of book-keepers (primary/standby). The primary book-keeper need to synchronize its counter state to the standby via shared disks or counter change replication. The standby will continuously monitor the health of the primary book-keeper and will take over its role when it crashes.


Peer Multicast

Instead of using a central book-keeper to track the number range assignment, we can replicate this information among the workers themselves. Under this scheme, every worker keep track of its current allocated range as well as the highest allocated range. So when a worker exhaust its current allocated range, it broadcast a "range allocation request" to all other workers, wait for all of their acknowledgments, and then update its current allocated range.

It is possible that multiple clients making request at the same time. This kind of conflicts can be resolved by distributed co-ordination algorithms (there are many of well-known ones and one of them is bully algorithm)

For performance reason, the worker doesn't persist its the most updated allocated id. In case the client crashes, it will request a brand-new range after bootup, even the previous range was not fully utilized.


Distributed Hashtable

By using the "key-based-routing" model of DHT, each worker will create some "DHT Node" (with a randomly generated 64-bit node id) and join a "DHT ring". Under this scheme, the number range is allocated implicitly (between the node id and its immediate neighbor's node id).

Now, we can utilize a number of nice characteristics of the DHT model such as we can use large number of user's resource for workers with O(logN) routing hobs. And also the DHT nodes contains replicated data of its neighbors so that if one DHT node crashes, its neighbor will take over its number range immediately.

Now, what if a DHT node has exhausted its implicitly allocated number range ? When this happens, the worker will start a new DHT node (which will join at some point in the ring and then has a newly assigned number range).

Threading Building Blocks

Just for a quick note. Intel recently made their TBB library open source and since it has a different task-based approach to incorporating parallism in C++, it felt interesting. It uses threads internally but keeps the user code away from threads themselves by parallelizing actions/tasks user performs in his/her code. Looks nice, sounds good, since threads are basically a low level detail to achieving benefits of parallel processing and if there can be laid a layer of abstraction that insulates programmers from those details, it could ease out and quicken the developement of parallel processing within applications without getting into the nitty-gritty of threads. Analogously, do we deal with the parallel processing across FPUs?

Few days back I downloaded the Open Sourced Intel Threading Building Blocks library and started to test a few samples with it.

One of the sample solution in there was using parallel_for algorithm (parallel_for.h). It was giving me a fatal error that the file affinity.h could not be found. It did not exist! This file was being included in parallel_for.h.

After much of searching and going quickly through the docs, I could not find the reason but I luckily bumped into a thread in the libraries discussion forum. Where one user complained of the file missing from the development release.

The suggestion to fix this was:

1. Either comment out the include.
2. Add an empty file in the include path.

This resolves the compilation issue but why keep such an include anyway? It wastes a bit of time of everyone who is new to the project and wants to get quickly building out samples and testing/debugging and seeing the library in action. Probably they will fix this soon.

More information can be found here - Latest Developer Release Missing affinity.h.

Hope this helps anyone who is starting out on TBB and falls upon the same problem until that header comes alive in the project while I will go back to experimenting more with it. Good luck and have fun! :)

Something about Local Classes

We all know that identifiers (variables, objects, functions etc.) may have
two scopes in C++. They may be declared as global or local to a block.


We have seen identifiers like variables and functions to be defined locally
and globally quite often but there is one identifier which is not that commonly
declared as local, yeah you guessed right, its classes.


You might have noticed the fact that classes are almost always declared as
global even when they are to be used only in one block. It is so because of
some reasons that we’ll discuss later.


First let’s have a look at a class declared locally:



// this code contains a local class
#include <iostream.h>

void func();

void main()
{
// myclass unknown here
}

void func()
{
class myclass
{
...
...
};
}


While classes may also be defined as local, there are some restrictions of
what can be done and what cannot be, they are listed below:




  • Member functions must be inline




  • Members of local class cannot access other variables within the block




  • It cannot have static member variables




Here is an example program:



// this code contains a local class
#include <iostream.h>

void func();

void main()
{
// myclass unknown here
func();
}

void func()
{
int num;

class myclass
{
int a;

public:
// member functions MUST be defined
// inside the class
void set(int x)
{
// can't access num
a=x;
}
void show()
{
cout<<a<<endl;
}
};


myclass ob;
ob.set(10);
ob.show();
}


Related Articles:


Check out this stream