[This is a message I composed in response to someone who was asking how to write a function to convert binary numbers to decimal. I have paraphrased the original correspondent's queries.]

From: scs@eskimo.com (Steve Summit)
Subject: Re: please help
Date: Sat, 29 Apr 2000 12:36:07 -0700 (PDT)
Message-Id: <200004291936.MAA18560@mail.eskimo.com>

You wrote:
> I am trying to write a function that takes a binary number and returns
> its the decimal value. Is there a library function that does this?

Sort of; in fact depending on how you think about the question, there are several functions. But those functions don't work quite like you think they do, because you're thinking about the question in a different way, one which it turns out doesn't make as much sense.

> The sort of function I'm thinking of is:
>

>	#include <math.h>
>
>	int binary_to_decimal(int number)
>	{
>		int decimal = 0;
>		int i, digit;
>
>		for(i = 0; i < length(number); i++)
>		{
>			digit = substring(number, i);
>			/* where, say, substring(11011, 2) returns 0 */
>			/* but I'm not sure about this part */
>
>			if(digit == 1)
>				decimal += pow(2,i);
>		}
>
>		return decimal;
>	}

You've got the right basic algorithm there, and if your ``number'' variable (that is, the input of your function) was a string, you'd have a perfectly useful function. Notice, too, that if you changed the lines

	if(digit == 1)
		decimal += pow(2,i);
to
	decimal += digit * pow(2, i);
you could make the program work for any base, just by changing the value 2 to something else.

> but I don't know how to get substring(number, i) to work.

That's because you're thinking of a ``number'' as a string of digits. But integers are not represented internally as strings of digits.

Now, if your function did accept a string of digits (that is, if its input argument were of type string, not int), it would be easy enough to get your hands on each digit, because of course it's always easy to get your hand on the individual characters that make up a string.

(There is no standard ``substring'' function in C, although it's easy enough to write one, although as its name implies, it's almost invariably designed to solve the general problem of extracting substrings from a string, not necessarily single digits. And, once again, what it would be extracting from is a string variable, not an integer variable.)

How were you planning on passing a ``binary'' number to your function? Were you thinking of calling something like this?

	binary_to_decimal(11011);
In that call, you haven't passed the binary number one-one-oh-one-one; you've passed the decimal number eleven thousand and eleven. I suppose you could write some code that accepted decimal numbers with the digits all 0 or 1, and pretended it was a binary number, and ``converted'' it to decimal, but this would be a confusing and not very useful function. (To get at the digits, you'd end up dividing by 10, which would be a strange thing to do to extract bits which you were thinking of as being in base two.)

You were trying to write a function that accepted an int, and that returned an int. Presumably you were intending one of those ints to be in ``base 2'' and one of them to be in ``base 10''. But it really doesn't make sense to talk about what ``base'' an integer is in; an integer is just a number. The ``base'' only matters when we write numerals out on paper.

Here is an example. How many x's are there on this line?

	xxxxxxxxxxxxxxxxx
Is your answer in base two or base ten? The question doesn't make much sense. If you said ``seventeen'', your answer is not in base two or base ten; your answer is in English. Seventeen is the number of x's on the line.

We can represent numbers in different bases. If we represent the number seventeen in base ten, we get the numeral 17. If we represent it in base two, we get the numeral 10001. But the number of x's on the line didn't change when we switched from 17 base ten to 10001 base two.

I believe that the right way to think of the numbers stored in int variables as just that: numbers. It doesn't really matter, or make sense to ask, what base they're in. (In truth, on a modern computer, integers are virtually always stored in binary, but most of the time, that fact really doesn't concern us.)

Here's one way to convince yourself that ints in C are just numbers. Suppose you have an int variable

	int i;
and you print it out using printf:

	printf("%d\n", i);

Now, as you probably know, printf can also print things out in base 8 and base 16. So we could also write

	printf("%o\n", i);
or
	printf("%x\n", i);

Do we have to match the printf format we use (%d, %o, or %x) to the base of the number we're printing? No, because as I was saying, the int variable i is just a number; it doesn't have any inherent base. (Or, if it does have an inherent base, it's base two, meaning that all three printf conversions involve a conversion from base 2 to some other base.)

In fact, one answer to the question ``how do I convert a binary number to decimal?'' in C is, ``print it out using printf %d''. The answer only works if you think of ints as being base two internally, and it probably isn't the answer you expected if you were trying to convert binary to decimal explicitly, but it is a very real answer. (If the binary number you had was a string of 0's and 1's, then what you really wanted to do was just to convert that string to an int, perhaps so that you could print it back out using %d.)

If we believe (as I do) that int variables are ``just numbers'', then the only times it makes sense to talk about what base they're in are when we print them out for people in the real world to look at on paper or on the screen, or when we read them in from people in the real world who are poking at keyboards containing keys with the digits 0 through 9 on them. In those cases, just about all of the conversions are taken care for us automatically, without our hardly being aware of it:

  1. When we write a C program containing a line like
    	int i = 1234;
    
    the C compiler automatically converts those decimal digits 1 2 3 4 to an integer. If we want to, we can enter integer constants in base 8 or base 16:
    	int j = 0123;
    	int k = 0x123;
    
    and the compiler does octal or hexadecimal conversion, as appropriate.

  2. When we print an integer, using printf or the like, it's printed in base 10 if we use %d, or in base 8 or 16 if we use %o or %x.

  3. When we request numeric input from the user, and we choose to use scanf or the like, we again select the base using the format character, %d, %o, or %x.

  4. When we have a string that we want to convert to an integer, we can call the atoi function:
    	char *str = "123";
    	int i = atoi(str);
    
    atoi always does decimal conversion.

  5. Finally, if we have a string in an arbitrary base, we can convert it to an integer using the strtol function:
    	char *str2 = "1234";
    	i = strtol(str2, NULL, 7);
    
    This converts the string "1234", interpreted as a numeral in base 7, to an integer, and stores it in i. (If we then printed i back out using %d, we'd see the decimal numeral 466, which is 1234 base 7.)

So strtol is the other standard C library function that's often useful in realistic base conversion problems, and it's one of the only standard C library functions that lets you specify an arbitrary base. (You can also specify the base as 0, in which case strtol will use the same rules the C compiler uses, namely that a leading 0 indicates base 8 and a leading 0x indicates base 16. It turns out that scanf can do this, too, using the %i format.)

You'll notice that every standard function (or other language feature) that seems to have anything to do with base conversion is either converting from an integer to a string or printed representation, or is converting from a string or printed representation to an integer. You won't find a ``base conversion'' function that ``converts'' from integer to integer, because as I've explained, that ``conversion'' doesn't make sense. Integers are not stored internally as strings of digits. If, however, you want to think about the ``base'' of a numeral, that's a concept which only makes sense (in fact, which is defined) in terms of strings of digits.

If you'd like to truly understand base conversion, in the contexts where it really comes up when writing modern computer programs, here are a few functions for you to write.

  1. Write a function which accepts a string of digits and converts it to an integer, interpreting the string of digits as a numeral in base 10. This function will actually be a lot like the ``conversion'' function you were trying to write, except that (a) it accepts a string, not an int; (b) it will be easy to access the digits (that is, to perform the ``substring'' operation), because a string in C is just an array of characters, and array subscripting is easy; and (c) you'll be converting from base 10, not base 2. (Also, you can get away without calling pow() at all, if you arrange things right. If you convert the digits in left-to-right order, all you have to do is keep a running sum, and at each digit, multiply the previous sum by 10 before adding in the new digit.)

    If you write this function, you'll essentially have reimplemented the standard library atoi() function.

    One other problem you may encounter is dealing with digits as characters. Your input is a string like "1234", so the digits are characters, like '1'. But the value of the character '1' in ASCII is not 1. But it turns out that converting digit characters to their appropriate digit values is easy: just subtract the character set value of the character '0', whatever that is. '0' - '0' is obviously 0, no matter what value '0' has. Similarly, '1' - '0' is 1. So if you have a character variable c, containing one of the digit characters '0' through '9', then c - '0' is the decimal value of that digit. [The point here is that you do not need to know what the character set value of the character '0' is; all you need to do is subtract the character constant '0'.]

  2. Write a function which accepts a string of digits and an integer value b, and converts the string to an integer, interpreting the string of digits as a numeral in base b. Basically, all you'll have to do is take your function from #1 and change ``10'' to ``b''. When you're done, you'll have reimplemented part of the standard library strtol() function. [And you can use it, if you like, to perform ``binary to decimal conversion'', by calling it with a base of 2. As we've discussed, this is actually ``binary to number conversion'', not ``binary to decimal''.]

  3. Write a function which accepts an integer and creates a string of digits forming the base 10 representation of the integer. (Your compiler may already have such a function, perhaps called ``itoa'', although it's not a Standard function.) As you write this function, you'll essentially be reimplementing the task which printf performs when it prints something using %d.

    It's actually easy enough to compute the digits of a number's base-10 representation: you can take i % 10 to pick off the rightmost digit, and then say i = i / 10 to throw away the rightmost digit (since you already picked it off), and then say i % 10 again to pick off what had been the next-to-rightmost digit, etc.

    One tricky part of writing this function is that it gives you the digits the wrong way around: it picks them off in right-to-left order, while you probably wanted to print them out, or store them in string, in left-to-right order. There are various ways around this problem. (Another tricky part is converting digit values to characters, but here all you have to do is use the inverse of the trick mentioned under #1 above: if d is a digit value you've computed, then d + '0' is the character value you want.)

  4. Write a function which accepts an integer and creates a string of digits forming the base b representation of the integer, where b is another int argument to the function. (Some compilers come with an itoa function which also accepts a base.) Basically, all you'll have to do is take your function from #3 and change ``10'' to ``b''.

Finally, if you do write these functions, and get them to work, take a look: your functions in #1 and #2 convert from base 10 or base b, and your functions in #3 and #4 convert to base 10 or base b. But what base do your functions in #1 and #2 convert to? What base do your functions in #3 and #4 convert from? You can't tell; all they really convert to and from is ``ints''. The base (if it's even meaningful to ask what it is) is implicit; the details of the internal representation of integers are all taken care of for you by your C compiler, the machine code it generates, and by your machine's CPU. You did a certain amount of multiplying or dividing by 10 or b, but the ``other'' base was essentially built in to the multiplication and division operators. If you had to ``port'' the code you wrote from a machine that used binary integers internally to a hypothetical one that used decimal integers internally, you wouldn't have to change a thing. All of these facts underlie my claim that it doesn't make much sense to ask what base these ints are in; they're just numbers.

See also the comp.lang.c FAQ list, question 20.10 (and 13.1, while you're at it).

Steve Summit
scs@eskimo.com