WinFS

Microsoft released a beta version of the new file system named "WinFS" (Windows File System).

What is WinFS
"WinFS" is the code name for the next generation storage platform in Windows "Longhorn." Taking advantage of database technologies, Microsoft is advancing the file system into an integrated store for file data, relational data, and XML data. Windows users will have intuitive new ways to find, relate, and act on their information, regardless of what application creates the data. Also, "WinFS" will have built-in support for multi-master data synchronization across other Longhorn machines and other data sources. The platform supports rich managed Longhorn APIs as well as Win32 APIs.

Read More>>

The biggest enhancement to Longhorn that is being pulled from the 2006 release is WinFS (Windows File System), said company officials in Redmond, Wash. WinFS is a next-generation storage subsystem that allows advanced data organization and management and improves the storage and retrieval of files.

Read More>>

The next Windows release won't ship with the WinFS unified storage system, one of the three key components of Longhorn, as outlined by Microsoft at its Professional Developers Conference (PDC) in October last year (2004).

Read More>>

WinFS has a new synchronisation engine that can index disparate Windows files in a way that would enable users to more easily search and catalogue them. Pretty much what Google does already.
So will "Google Desktop Search" be vanished? I don't think so. What do you feel?

Const objects and a piece of buggy code!

Question: Here is a mallicious piece of code that I came across a few days ago:
[CODE]

#include
int main(){
const int x=5;
*(int *)&x=10;
printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&(*(int*)&x));
printf("x=%d at address =0x%x\n",x,&x);
const_cast <> (x) = 20;
printf("New x=%d at address =0x%x\n",x,&x);
printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&(*(int*)&x));
return 0;
}

It ran well on VC++ 6.0 and gave wierd outputs:
[Output]
Value of x = 10 at address= 0x12ff7c
x=5 at address =0x12ff7c
New x=5 at address =0x12ff7c
Value of x = 20 at address= 0x12ff7c

Isnt that wierd and wrong? Explain this behaviour.

Answer: The code results in an undefined behaviour.
Explanation: It does not work. It just seems to work. Here is the why and how:

You have declared the variable as a const int. So, when you use the address-of operator on this variable it returns a const int * as compared to a normal int variable in which case it return a normal int *. Now, what you are doing is you got the const int * and then you are using the C-style cast (int *) to cast the const int * to a int *. This is the problem. This is where the undefined behaviour is exposed. You forced the pointer to be converted to an int * and use it to modify the original const object. However, the compiler is free to store the constants whereever it wants which might even be a read-only memory location. And if you try making any changes to it, the results are said to be undefined according to the standards. This could probably work, not do anything or even crash.

Instead of printing what is there with 'x', the compiler optimizes the code. The compiler knows that you have defined x as a const int and you have initialized it with 5. So, it is at its own free-will to use optimization and take the value to be printed from this constant rather than using the variable x and dereferencing from its memory location. Here's a quote I like:

"The const keyword can't keep you from purposely shooting yourself in the foot. Using explicit type-casting, you can freely blow off your entire leg, because while the compiler helps prevent accidental errors, it lets you make errors on purpose. Casting allows you to "pretend" that a variable is a different type."

The villains are actually these two statements:
1. *(int *)&x=10; and
2. const_cast(x) =20;
The first statement is enough to force the undefined behaviour but lets suppose that we avoided it then this second statement becomes the culprit. Herb Sutter in his Exceptional C++ item - 6 says that removing the constness of the const object this way results in undefined behaviour if the actual object is defined as a const. Thats it. If you are further interested, see at the assembly code for this program (VC++ 6.0 Disassembly):
[CODE]
1: #include
2: int main()
3: {
00401010 push ebp
00401011 mov ebp,esp
00401013 sub esp,44h
00401016 push ebx
00401017 push esi
00401018 push edi
00401019 lea edi,[ebp-44h]
0040101C mov ecx,11h
00401021 mov eax,0CCCCCCCCh
00401026 rep stos dword ptr [edi]
4: const int x=5;
00401028 mov dword ptr [ebp-4],5
5: *(int *)&x=10;
0040102F mov dword ptr [ebp-4],0Ah
6: printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&amp;amp;amp;amp;amp;(*(int*)&x));
00401036 lea eax,[ebp-4]
00401039 push eax
0040103A mov ecx,dword ptr [ebp-4]
0040103D push ecx
0040103E push offset string "Value of x = %d at address= 0x%x"... (00420058)
00401043 call printf (004010d0)
00401048 add esp,0Ch
7: printf("x=%d at address =0x%x\n",x,&x);
0040104B lea edx,[ebp-4]
0040104E push edx
0040104F push 5
00401051 push offset string "x=%d at address =0x%x\n" (0042003c)
00401056 call printf (004010d0)
0040105B add esp,0Ch
8: const_cast(x) =20;
0040105E mov dword ptr [ebp-4],14h
9: printf("New x=%d at address =0x%x\n",x,&x);
00401065 lea eax,[ebp-4]
00401068 push eax
00401069 push 5
0040106B push offset string "New x=%d at address =0x%x\n" (0042001c)
00401070 call printf (004010d0)
00401075 add esp,0Ch
10: printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&amp;amp;amp;amp;amp;(*(int*)&x));
00401078 lea ecx,[ebp-4]
0040107B push ecx
0040107C mov edx,dword ptr [ebp-4]
0040107F push edx
00401080 push offset string "Value of x = %d at address= 0x%x"... (00420058)
00401085 call printf (004010d0)
0040108A add esp,0Ch
11: return 0;
0040108D xor eax,eax
12: }
0040108F pop edi
00401090 pop esi
00401091 pop ebx
00401092 add esp,44h
00401095 cmp ebp,esp
00401097 call __chkesp (00401150)
0040109C mov esp,ebp
0040109E pop ebp
0040109F ret

Even I am not very much aware (until now) of the assembly language but I guess this snippet is easy enough to get a feel about what things are happening. The locations marked in black are the one's important to us. Instead of dereferencing x and taking the value from its address, as a optimization techniques the compiler simply pushes in a constant value 5 and uses it while (or even when) there is a different actual value at the storage location of x.

Hence, people! just beware of landing up with code where you cause the undefined behaviour show up. They may look very convincing and correct and that leads to a buggy code very hard to fix. Hope you all liked and enjoyed this post.

P.S. - Please report comments, corrections, and suggestions at this blog. I am learning.

www.kamal.org

Today is one of the happiest days in my life as I got my public domain: www.kamal.org. One of my friends, Maduranga helped me lot in purchasing this. Thanks a lot Maduranga.

I thought of redirecting it to my blog. Later I will buy some space in the web to host my home site (which is not ready yet).

As yesterday was my 25th birthday, I will never forget the day I got my domain.

So a grand ice cream treat was ready for all my friends in the office. (If you were not in my office!!! hard luck. You may have missed it.)

How to avoid the top five IT mistakes

Got a chance to read article named "How to avoid the top five IT mistakes".

Five Mistakes are;
1. Too much customization
2. Skimping on training
3. Absentee management
4. Freewheeling consultants
5. Assuming you're finished

Point number 1. is something that I'm also trying to point out in project meetings as most of the time people think on customizations which they would never ever use. And sometimes providing that much of customization leads to a mess.

What do you think?

Variadic functions and non-POD arguments

This is regarding an issue that I came across a few days back and I really liked this one as I helped the guy understand a few concepts involved better and in the process learning myself. This in heart involved variadic functions. I quote the problem below:

Problem:

There is a class.(I would not provide the code as it would make the post long, its easily understandable in words and if you feel like then you try experimenting with some code on your own). The various members it has are:

1. A constructor with no arguments.
2. A copy constructor.
3. A destructor.
4. An overloaded (++) operator. (not that significant as far as the solution to the issue is concerned)
5. A private data member. (not that significant as far as the solution to the issue is concerned)

We create an object of this and use printf statement once. In the printf I pass the object of this class. Printf and scanf are functions of truely a different breed. You must have heard about variadic functions. They can take any number of different types of arguments that's their speciality. What happens during this whole process in main - constructor is called twice (2nd one being a call to the copy constructor) and the destructor is called just once!!! Is there some kind of memory leak? What happened, whats the reason for this awkward behaviour?

Short answer:

The behaviour in the given scenario is undefined. What it means is you got these results on some compiler - you would get different results on some other and there is no guarantee as such that application would not crash. I tried running the code of my machine with VC++ 6.0 and it worked fine. That is what is undefined behaviour... we never know.


Explanation:

We will look forward to the solution with a very generic approach. We would not restrict the reasoning to just printf. Since they are examples of variadic functions so lets start with variadic functions in general. To know more about variadic functions - I would suggest you to visit this -
Variadic Functions : gnu documentation
and Variadic functions : informit article.

There are various restrictions imposed whenever you try using variadic functions in C++. I would try stating the scenarios making for this undefined behaviour below and put quotes from these links that I find significant.

1. Simply because the rules say (Hehe :-)), you cannot use non-POD types in the variable argument list of a variadic function. That would result in an undefined behaviour. It's about time I gave a definition to non-POD type or rather POD type? Here's what they are (I quote the definition from the C++ FAQ Lite - a collection by Marshall Cline). POD type = Plain Old Data type.

[Definition]
A POD type is a C++ type that has an equivalent in C, and that uses the same rules as C uses for initialization, copying, layout, and addressing. As an example, the C declaration struct Fred x; does not initialize the members of the Fred variable x. To make this same behavior happen in C++, Fred would need to not have any constructors. Similarly to make the C++ version of copying the same as the C version, the C++ Fred must not have overloaded the assignment operator. To make sure the other rules match, the C++ version must not have virtual functions, base classes, non-static members that are private or protected, or a destructor. It can, however, have static data members, static member functions, and non-static non-virtual member functions. The actual definition of a POD type is recursive and gets a little gnarly.

Here's a slightly simplified definition of POD: a POD type's non-static data members must be public and can be of any of these types: bool, any numeric type including the various char variants, any enumeration type, any data-pointer type (that is, any type convertible to void*), any pointer-to-function type, or any POD type, including arrays of any of these. Note: data-pointers and pointers-to-function are okay, but pointers-to-member are not. Also note that references are not allowed. In addition, a POD type can't have constructors, virtual functions, base classes, or an overloaded assignment operator.


2. C++ has two mechanisms of passing arguments - by value and by reference compared to just pass by value as in C. Now, here's the ambiguity that creeps in, in these 2 cases is that when you make the calls by either of these they are same syntax from caller point! So, the compiler (and the variadic function) is not able to distinguish if that is intended to be passed by value or by reference, if a copy is to be made or not. This might be fixed by forcing one single argument passing option i.e. it always happens by pass-by-value (can't choose by-reference as we need to keep them working for C as well). Calling conventions (point 3 below) make this more complicated as you will see.


3. The function calling conventions, we would be considering are __stdcall and __cdecl. You could read up on them on MSDN with a simple search. With __cdecl, the caller has the responsibility to clean up the stack and with __stdcall the callee has this responsibility. Now, variadic functions impose certain basic rules as stated in point (1) above due to which _stdcall would not work. __cdecl would possibly give similar behaviours but still in case of non-POD types, its a general behaviour that the callee (I dont have proofs to back this but you can take my word - or you could try doing some research with the assembly code) calls the respective destructors or atleast that it's not standardized behaviour so you cannot rule it out. Now, here's the problem - the function doesnt know the number and types of the arguments and hence this would be a step that would cause issues..dreading issues. So, it simply assumes what I have stated in point (2) that the variables being passed are POD types (againts the non-POD types that are actually being passed) and the destructor doesn't (or better say, may not) get called! This is a problem.

4. There are certain restrictions regarding variadic arguments/functions that simply are rules to reject any possibilities:

    4.1) A function that accepts a variable number of arguments must be declared with a prototype that says so. You write the fixed arguments as usual, and then tack on `...' to indicate the possibility of additional arguments. The syntax of ISO C requires at least one fixed argument before the `...'.

    4.2) For some C compilers, the last required argument must not be declared register in the function definition. Furthermore, this argument's type must be self-promoting: that is, the default promotions must not change its type. This rules out array and function types, as well as float, char (whether signed or not) and short int (whether signed or not). This is actually an ISO C requirement.

    4.3) The parameter parmN is the identifier of the rightmost parameter in the variable parameter list of the function definition (the one just before the ...). If the parameter parmN is declared with a function, array, or reference type, or with a type that is not compatible with the type that results when passing an argument for which there is no parameter, the behavior is undefined. (C++ ANSI standard, 18.7.3)

    4.4) Lastly, when there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.7). ...if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type, the behavior is undefined. (ANSI C++ 5.2.2.7)

Apart from all that, probably, there might be a way to implement variadics such that they just work but the points above are useful to know when writing variadics according to the current set of rules about when they work and when they don't.

Conclusion:

So, now I infer that variadic functions are quite specific to C. You cannot use non-POD C++ types with them. For manipulating non-POD types, you should use the C++ techniques. You are quite restricted to using variadics in C++ but still they are sometimes needed and they should be carefully used with POD types only. The article links (the two that I have above) illustrate the points further in detail.

Here are a few alternatives (one of the below or they together with each other) to these functions in C++:

1. Functions with Default Arguments
2. USage of Function Templates
3. Packing Function Arguments in a Container
4. Overloading operator <<

You could also refer to the codeguru thread that is the base for this post - that helped me digest the above as a rule - Destruction (variadic woes).

P.S. - Please report comments, corrections, and suggestions at this blog. I am learning.

Check out this stream