String Shift and Print

After my last blog post, someone pointed out a similar problem here. The problem goes:

Write a program that generates from the string "abcdefghijklmnopqrstuvwxyz{" the following:
a
bcb
cdedc
defgfed
efghihgfe
fghijkjihgf
ghijklmlkjihg
hijklmnonmlkjih
ijklmnopqponmlkji
jklmnopqrsrqponmlkj
klmnopqrstutsrqponmlk
lmnopqrstuvwvutsrqponml
mnopqrstuvwxyxwvutsrqponm
nopqrstuvwxyz{zyxwvutsrqpon

Well there is no time limit this time and we dont even need to use google. I think this is slightly trickier than the last one.

Here is my solution:



//Program tested on Microsoft Visual Studio 2008 - Zahid Ghadialy

#include<iostream>
#include<string>

using namespace
std;

int
main()
{

string str = "abcdefghijklmnopqrstuvwxyz{";
unsigned
strlen = str.length();
string tempStr;
int
loop = 0;
while
(tempStr.length() < strlen)
{

int
start1 = loop;
int
count1 = loop + 1;
tempStr = str.substr(start1,count1);

count1--;
for
(; count1 > 0; count1--)
tempStr.append(str.substr(start1+count1-1, 1));

cout<<tempStr;
cout<<endl;
loop++;
}

return
0;
}




ASCII Print

Few weeks back, someone I know got asked the following question in an interview:

Write a C++ program in 15 minutes to print the following in the output:
a
bcb
cdedc
defgfed
efghihgfe
fghijkjihgf
ghijklmlkjihg
hijklmnonmlkjih
ijklmnopqponmlkji
jklmnopqrsrqponmlkj
klmnopqrstutsrqponmlk
lmnopqrstuvwvutsrqponml
mnopqrstuvwxyxwvutsrqponm
nopqrstuvwxyz{zyxwvutsrqpon

You can do search on the web for any information but all your searches would be logged and used to assess your answer.

Well my obvious guess would be to take help of ASCII table. My only search was for Ascii table in google. I did try and make it in less than 15 mins. My solution is below but do try it on your own before looking at my solution.


//Program tested on Microsoft Visual Studio 2008 - Zahid Ghadialy
#include<iostream>
#include<string>

using namespace
std;

int
main()
{

char
c=(char)97; //'a' in ASCII
char loop = 0;
while
(c <= 'n')
{

char
x = 0;
for
(; x <=loop; x++)
{

cout<<char(c+x);
}

for
(char y = 1; y < x; y++)
{

cout<<char(c+x-y-1);
}

cout<<endl;
c++;
loop++;
}


return
0;
}



Conditional (Ternary) Operator in c++ [? :]

Conditional (Ternary) Operator in c++ [? :]


The condional operator can often be used instead of the if else statement. Since it is the only operator that requires three operands in c++, it is also called ternary operator.

For example, consider the assignment statement :
x = y > 3 ? 2 : 4;
If y is greater than 3 then 2 will be assigned to variable x or else the value 4 will be assigned to x.

/* A simple c++ program example to demonstrate the use of ternary operator. */


// Program 1


#include <iostream>
#include <iomanip>

using namespace std;

int main ()
{
    int first, second;

    cout << "Enter two integers." << endl;

    cout << "First" << setw (3) << ": ";
    cin >> first;

    cout << "Second" << setw (2) << ": ";
    cin >> second;

    string message = first > second ? "first is greater than second" :
                                "first is less than or equal to second";

    cout << message << endl;

    return 0;
}


Compare the above "Program 1" and below "Program 2" with "Program 2" and "Program 3" of Selection Statement (if-else if-else) in c++ respectively, for better understanding.

/* A c++ program example to demonstrate the use ternary operator.*/


// Program 2


#include <iostream>
#include <iomanip>

using namespace std;

int main ()
{
    int first, second;

    cout << "Enter two integers." << endl;

    cout << "First" << setw (3) << ": ";
    cin >> first;

    cout << "Second" << setw (2) << ": ";
    cin >> second;

    string message = first > second ? "first is greater than second" :
                                first < second ? "first is less than second" :
                                "first and second are equal";

    cout << message << endl;

    return 0;
}

Difference between Deep copy and Shallow copy in C++

So what is the difference between 'Deep copy' (sometimes referred to as 'Hard copy') and 'Shallow copy' in C++

Shallow copy: Copies the member values from one object into another.

Deep Copy: Copies the member values from one object into another. Any pointer objects are duplicated and Deep Copied.

There are some interesting discussions on Stack Overflow here and here. A related discussions here is interesting as well.

Program to demonstrate as follows:


//Program tested on Microsoft Visual Studio 2008 - Zahid Ghadialy
#include<iostream>
#include<string>

using namespace
std;

class
MyString
{

public
:
MyString(){}
MyString(string str)
{

size = str.size();
data = new char(size+1); //+1 for '\0'
memcpy(data, str.c_str(), size+1);
}

//MyString(const MyString& copy); - Use default for Shallow
void DeepCopy(const MyString& copy) //Deep Copy
{
size = copy.size;
data = new char(size+1); //+1 for '\0'
memcpy(data, copy.data, size+1);
}

int
size;
char
* data;
};


int
main()
{

MyString m1("Zahid");
cout<<"\nm1.size = "<<m1.size<<", &m1.data = "<<(void *)m1.data<<", *m1.data = "<<m1.data<<endl;

MyString m2(m1); //Uses the default copy constructor - Shallow Copy
cout<<"\nm2.size = "<<m2.size<<", &m2.data = "<<(void *)m2.data<<", *m2.data = "<<m2.data<<endl;

MyString m3; //Another default constructor
m3.DeepCopy(m1);
cout<<"\nm3.size = "<<m3.size<<", &m3.data = "<<(void *)m3.data<<", *m3.data = "<<m3.data<<endl;

return
0;
}


Output as follows:


BI at large scale

As more and more data being collected everywhere from pretty much everything a user do, such as transactions activities, social interactions, information search ... enterprises has been actively looking into ways to turn these vast amount of raw data into useful information.

BI process flow

It include the following stages of processing
  1. ETL: Extract operational data (inside enterprise or external sources) into data warehouse (typically organized in Star/Snowflake schema with Fact and Dimension tables).
  2. Data exploration: Get insight into data using simple visualization tools (e.g. histogram, summary statistics) or sophisticated OLAP tools (slice, dice, rollup, drilldown)
  3. Report generation: Produce executive reports
  4. Data mining: Extract patterns of the underlying data to form models (e.g. bayesian networks, linear regression, neural networks, decision trees, support vector machines, nearest neighbors, association rules, principal component analysis)
  5. Feedback: The model will be used to assist business decision making (predicting the future)
The gap of processing BIG data
Many data mining and machine learning algorithms are available in both commercial packages (e.g. SAS, SPSS) as well as open source libraries (e.g. Weka, R). Nevertheless, most of these ML algorithms implementation are based on fitting al data in memory and not designed to process big data (e.g. Tera byte data volume).

On the other hand, massively parallel processing platform such as Hadoop, Map/Reduce, over the last few years, has been proven in processing Terabyte or even Petabyte range of data. Although many sequential algorithm can be restructured to run in map reduce, including a big portion of machine learning algorithm, there isn't a corresponding parallel implementation of ML available in massively parallel form.

Approach 1: Apache Mahout
One approach is to "re-implement" the ML algorithm in Map/Reduce and this is the path of Apache Mahout project. Mahout seems to have implemented an impressive list of algorithms although I haven't used them for my projects yet.

Approach 2: Ensemble of parallel independent learners
This is an alternative path that doesn't require re-implementation of existing algorithms. It works in the following way.
  1. Draw samples from the Big data into many sample data sets, which can fit into the memory of a single, individual learner.
  2. Assign each sample data set to an individual learner, who use existing algorithms to learn the model. After learning, each individual learner keep their own learned model
  3. When a decision / prediction request is received, each individual learner will come up with its own prediction and then combine their results in some ways. (e.g. for classification task, the learners will vote for the predicted class and the majority wins. for regression, the average of the estimate values will be used to predict the output value)

I also found this approach can smoothly fade out outdated model. As user's behavior may change over time, same happens to the validity of a learned model. With this ensemble approach, I can have multiple learners each learn their model periodically. Everytime when a prediction is needed, I will pick the latest k models and combine the final prediction based on a time-decayed weighted voting model. Outdated model will automatically slide out the k-size window automatically.

One gotchas of sampling approach is the handling of rare events (since you may lost those rare events in sampling). In this case, stratified sampling (instead of simple random sampling) should be used.

Check out this stream