Thursday, December 29, 2011

Variables


It is fascinating how most computer languages have a concept called a variable, and yet it is treated so differently.

In algebra, we are taught that the variable is a placeholder for a value we haven't determined yet.  When you see 2X+5=25, you start thinking how can I solve for X?  You don't think, can I modify the value of X or have it point to a different value?

In BASIC and FORTRAN, a variable is a box into which you can put information.  Variables do not have special significance beyond this, and often give short, obscure names as reflects their throwaway nature.  Why introduce an explaining variable if the variable is just a means to an end?

In C, a variable is an alias for a location in your computer's memory.  Global variables are expressed in terms of an absolute address, while local variables are relative to the current stack frame.  This takes the "variables are just a box" concept and says "Hey, we're all adults here, and we know that behind these variables are a sequence of bytes."  It is a common practice to take the address of a variable and cast it to a different type.  You can get all sorts of wonderful efficiences provided you understand how memory is handled on the machine you are targeting, and deal with things like endianness and byte alignment.

In Python, a variable is a name which may be bound to an object.  The same object may have multiple names.  If an object is not bound to any name, it becomes a candidate for garbage collection.  If an object is mutable and is referenced by multiple names, you need to take care that function does not inadvertently modify the state of that object through one of those names, creating a side affect.

In Erlang, a variable is the named result of a calculation.  Once a value has been assigned to a variable, you cannot assign it a different value.  If you need a different value, you need to create a new variable.  This results in much more readable code, as each variable has a clear purpose, and you don't need to read through you entire program to see if some other piece of code is modifying that data.

I imagine other languages have even more ways to think about variables.


Sunday, December 4, 2011

The Trouble with References in C++


References in C++ are essentially pointers that masquerade as values.  They were originally added to the language in order to facilitate operator overloading, so that you could write:

bool operator==(const Foo& lhs, const Foo& rhs);
And then use it sensibly in a program like this:
if(foo_a == foo_b) {}
But consider this line of code:
x = calc_x();
What do you think is happening here?
Now lets say we widen our view a bit:

void Foo::do_something()
{
    int& x = this->m_x;
    // some other code...
    x = calc_x();
}

Suddenly, that same line has very different implications.  We aren't just assigning a value to x; we're mutating the state of the object.
The problem with references in C++ is that appears you are working with values when you are, in reality, using pointers.  This violates the Principal of Least Surprise.  It forces you to read all of the code in a function, before being able to comprehend a single line of it.  

If you rewrite this same routine with pointers instead of references, it becomes immediately clear what is happening:

void Foo::do_something()
 { 
    int* x = &this->m_x; 
    // some other code...     
    *x = calc_x();     
}

It is no wonder then that Google's C++ Style Guide requires that only const references be used.