3. Expressions

comp.lang.c FAQ list · Question 3.1

Q: Why doesn't this code:

a[i] = i++;

A: The subexpression i++ causes a side effect--it modifies i's value--which leads to undefined behavior since i is also referenced elsewhere in the same expression. There is no way of knowing whether the reference will happen before or after the side effect--in fact, neither obvious interpretation might hold; see question 3.9. (Note that although the language in K&R suggests that the behavior of this expression is unspecified, the C Standard makes the stronger statement that it is undefined--see question 11.33.)

References: K&R1 Sec. 2.12
K&R2 Sec. 2.12
ISO Sec. 6.3
H&S Sec. 7.12 pp. 227-9

comp.lang.c FAQ list · Question 3.2

Q: Under my compiler, the code

int i = 7;
printf("%d\n", i++ * i++);
prints 49. Regardless of the order of evaluation, shouldn't it print 56?

A: It's true that the postincrement and postdecrement operators ++ and -- perform their operations after yielding the former value. What's often misunderstood are the implications and precise definition of the word ``after.'' It is not guaranteed that an increment or decrement is performed immediately after giving up the previous value and before any other part of the expression is evaluated. It is merely guaranteed that the update will be performed sometime before the expression is considered ``finished'' (before the next ``sequence point,'' in ANSI C's terminology; see question 3.8). In the example, the compiler chose to multiply the previous value by itself and to perform both increments later.

The behavior of code which contains multiple, ambiguous side effects has always been undefined. (Loosely speaking, by ``multiple, ambiguous side effects'' we mean any combination of increment, decrement, and assignment operators (++, --, =, +=, -=, etc.) in a single expression which causes the same object either to be modified twice or modified and then inspected. This is a rough definition; see question 3.8 for a precise one, question 3.11 for a simpler one, and question 11.33 for the meaning of ``undefined.'') Don't even try to find out how your compiler implements such things, let alone write code which depends on them (contrary to the ill-advised exercises in many C textbooks); as Kernighan and Ritchie wisely point out, ``if you don't know how they are done on various machines, that innocence may help to protect you.''

References: K&R1 Sec. 2.12 p. 50
K&R2 Sec. 2.12 p. 54
ISO Sec. 6.3
H&S Sec. 7.12 pp. 227-9
CT&P Sec. 3.7 p. 47
PCS Sec. 9.5 pp. 120-1

comp.lang.c FAQ list · Question 3.3

Q: I've experimented with the code

int i = 3;
i = i++;
on several compilers. Some gave i the value 3, and some gave 4. Which compiler is correct?

A: There is no correct answer; the expression is undefined. See questions 3.1, 3.8, 3.9, and 11.33. (Also, note that neither i++ nor ++i is the same as i+1. If you want to increment i, use i=i+1, i+=1, i++, or ++i, not some combination. See also question 3.12b.)

comp.lang.c FAQ list · Question 3.3b

Q: Here's a slick expression:

a ^= b ^= a ^= b
It swaps a and b without using a temporary.

A: Not portably, it doesn't. It attempts to modify the variable a twice between sequence points, so its behavior is undefined.

For example, it has been reported that when given the code

	int a = 123, b = 7654;
	a ^= b ^= a ^= b;
the SCO Optimizing C compiler (icc) sets b to 123 and a to 0.

See also questions 3.1, 3.8, 10.3, and 20.15c.

comp.lang.c FAQ list · Question 3.4

Q: Can I use explicit parentheses to force the order of evaluation I want, and control these side effects? Even if I don't, doesn't precedence dictate it?

A: Not in general.

Operator precedence and explicit parentheses impose only a partial ordering on the evaluation of an expression. In the expression

	f() + g() * h()
although we know that the multiplication will happen before the addition, there is no telling which of the three functions will be called first. In other words, precedence only partially specifies order of evaluation, where ``partially'' emphatically does not cover evaluation of operands.

Parentheses tell the compiler which operands go with which operators; they do not force the compiler to evaluate everything within the parentheses first. Adding explicit parentheses to the above expression to make it

	f() + (g() * h())
would make no difference in the order of the function calls. Similarly, adding explicit parentheses to the expression from question 3.2 to make it
	(i++) * (i++)		/* WRONG */
accomplishes nothing (since ++ already has higher precedence than *); the expression remains undefined with or without them.

When you need to ensure the order of subexpression evaluation, you may need to use explicit temporary variables and separate statements.

References: K&R1 Sec. 2.12 p. 49, Sec. A.7 p. 185
K&R2 Sec. 2.12 pp. 52-3, Sec. A.7 p. 200

comp.lang.c FAQ list · Question 3.5

Q: But what about the && and || operators?
I see code like ``while((c = getchar()) != EOF && c != '\n')'' ...

A: There is a special ``short-circuiting'' exception for these operators: the right-hand side is not evaluated if the left-hand side determines the outcome (i.e. is true for || or false for &&). Therefore, left-to-right evaluation is guaranteed, as it also is for the comma operator (but see question 3.7). Furthermore, all of these operators (along with ?:) introduce an extra internal sequence point (see question 3.8).

References: K&R1 Sec. 2.6 p. 38, Secs. A7.11-12 pp. 190-1
K&R2 Sec. 2.6 p. 41, Secs. A7.14-15 pp. 207-8
ISO Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15
H&S Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 218-20, Sec. 7.12.1 p. 229
CT&P Sec. 3.7 pp. 46-7

comp.lang.c FAQ list · Question 3.6

Q: Is it safe to assume that the right-hand side of the && and || operators won't be evaluated if the left-hand side determines the outcome?

A: Yes. Idioms like

	if(d != 0 && n / d > 0)
		{ /* average is greater than 0 */ }
	if(p == NULL || *p == '\0')
		{ /* no string */ }
are quite common in C, and depend on this so-called short-circuiting behavior. In the first example, in the absence of short-circuiting behavior, the right-hand side would divide by 0--and perhaps crash--if d were equal to 0. In the second example, the right-hand side would attempt to reference nonexistent memory--and perhaps crash--if p were a null pointer.

References: ISO Sec. 6.3.13, Sec. 6.3.14
H&S Sec. 7.7 pp. 217-8

comp.lang.c FAQ list · Question 3.7

Q: Why did

printf("%d %d", f1(), f2());
call f2 first? I thought the comma operator guaranteed left-to-right evaluation.

A: The comma operator does guarantee left-to-right evaluation, but the commas separating the arguments in a function call are not comma operators. [footnote] The order of evaluation of the arguments to a function call is unspecified. (See question 11.33.)

References: K&R1 Sec. 3.5 p. 59
K&R2 Sec. 3.5 p. 63
ISO Sec.
H&S Sec. 7.10 p. 224

comp.lang.c FAQ list · Question 3.8

Q: How can I understand complex expressions like the ones in this section, and avoid writing undefined ones? What's a ``sequence point''?

A: A sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete. The sequence points listed in the C standard are:

The Standard states that

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.

These two rather opaque sentences say several things. First, they talk about operations bounded by the ``previous and next sequence points''; such operations usually correspond to full expressions. (In an expression statement, the ``next sequence point'' is usually at the terminating semicolon, and the ``previous sequence point'' is at the end of the previous statement. An expression may also contain intermediate sequence points, as listed above.)

The first sentence rules out both the examples

	i++ * i++
	i = i++
from questions 3.2 and 3.3--in both cases, i has its value modified twice within the expression, i.e. between sequence points. (If we were to write a similar expression which did have an internal sequence point, such as
	i++ && i++
it would be well-defined, if questionably useful.)

The second sentence can be quite difficult to understand. It turns out that it disallows code like

	a[i] = i++
from question 3.1. (Actually, the other expressions we've been discussing are in violation of the second sentence, as well.) To see why, let's first look more carefully at what the Standard is trying to allow and disallow.

Clearly, expressions like

	a = b
	c = d + e
which read some values and use them to write others, are well-defined and legal. Clearly, [footnote] expressions like
	i = i++
which modify the same value twice are abominations which needn't be allowed (or in any case, needn't be well-defined, i.e. we don't have to figure out a way to say what they do, and compilers don't have to support them). Expressions like these are disallowed by the first sentence.

It's also clear [footnote] that we'd like to disallow expressions like

	a[i] = i++
which modify i and use it along the way, but not disallow expressions like
	i = i + 1
which use and modify i but only modify it later when it's reasonably easy to ensure that the final store of the final value (into i, in this case) doesn't interfere with the earlier accesses.

And that's what the second sentence says: if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written. This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification. For example, the old standby i = i + 1 is allowed, because the access of i is used to determine i's final value. The example

	a[i] = i++
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. Since there's no good way to define it, the Standard declares that it is undefined, and that portable programs simply must not use such constructs.

See also questions 3.9 and 3.11.

References: ISO Sec., Sec. 6.3, Sec. 6.6, Annex C
Rationale Sec.
H&S Sec. 7.12.1 pp. 228-9

comp.lang.c FAQ list · Question 3.9

Q: So if I write

a[i] = i++;
and I don't care which cell of a[] gets written to, the code is fine, and i gets incremented by one, right?

A: Not necessarily! For one thing, if you don't care which cell of a[] gets written to, why write code which seems to write to a[] at all? More significantly, once an expression or program becomes undefined, all aspects of it become undefined. When an undefined expression has (apparently) two plausible interpretations, do not mislead yourself by imagining that the compiler will choose one or the other. The Standard does not require that a compiler make an obvious choice, and some compilers don't. In this case, not only do we not know whether a[i] or a[i+1] is written to, it is possible that a completely unrelated cell of the array (or any random part of memory) is written to, and it is also not possible to predict what final value i will receive. See questions 3.2, 3.3, 11.33, and 11.35.

comp.lang.c FAQ list · Question 3.10a

Q: People keep saying that the behavior of i = i++ is undefined, but I just tried it on an ANSI-conforming compiler, and got the results I expected.

A: See question 11.35.

comp.lang.c FAQ list · Question 3.10b

Q: People told me that if I evaluated an undefined expression, or accessed an uninitialized variable, I'd get a random, garbage value. But I tried it, and got zero. What's up with that?

A: It's hard to answer this question, because it's hard to see what the citation of the ``unexpected'' value of 0 is supposed to prove. C does guarantee that certain values will be initialized to 0 (see question 1.30), but for the rest (and certainly for the results of those undefined expressions), it is true that you might get garbage. The fact that you happened to get 0 one time does not mean you were wrong to have expected garbage, nor does it mean that you can depend on this happening next time (much less that you should write code which depends on it!).

Most memory blocks newly delivered by the operating system, and most as-yet-untouched stack frames, do tend to be zeroed, so the first time you access them, they may happen to contain 0, but after a program has run for a while, these regularities rapidly disappear. (And programs which unwittingly depend on a circumstantial initial value of an uninitialized variable can be very difficult to debug, because the ``expected'' values may coincidentally arise in all the small, easy test cases, while the unexpected values and the attendant crashes happen only in the larger, longer-running, much-harder-to-trace-through invocations.)

comp.lang.c FAQ list · Question 3.11

Q: How can I avoid these undefined evaluation order difficulties if I don't feel like learning the complicated rules?

A: The easiest answer is that if you steer clear of expressions which don't have reasonably obvious interpretations, for the most part you'll steer clear of the undefined ones, too. (Of course, ``reasonably obvious'' means different things to different people. This answer works as long as you agree that a[i] = i++ and i = i++ are not ``reasonably obvious.'')

To be a bit more precise, here are some simpler rules which, though slightly more conservative than the ones in the Standard, will help to make sure that your code is ``reasonably obvious'' and equally understandable to both the compiler and your fellow programmers:

  1. Make sure that each expression modifies at most one object. By ``object'' we mean either a simple variable, or a cell of an array, or the location pointed to by a pointer (e.g. *p). A ``modification'' is either simple assignment with the = operator, or a compound assignment with an operator like +=, -=, or *=, or an increment or decrement with ++ or -- (in either pre or post forms).
  2. If an object (as defined above) appears more than once in an expression, and is the object modified in the expression, make sure that all appearances of the object which fetch its value participate in the computation of the new value which is stored. This rule allows the expression
    	i = i + 1
    because although the object i appears twice and is modified, the appearance (on the right-hand side) which fetches i's old value is used to compute i's new value.
  3. If you want to break rule 1, make sure that the several objects being modified are distinctly different, and try to limit yourself to two or at most three modifications, and of a style matching those of the following examples. (Also, make sure that you continue to follow rule 2 for each object modified.) The expression
    	c = *p++
    is allowed under this rule, because the two objects modified (c and p) are distinct. The expression
    	*p++ = c
    is also allowed, because p and *p (i.e. p itself and what it points to) are both modified but are almost certainly distinct. Similarly, both
    	c = a[i++]
    	a[i++] = c
    are allowed, because c, i, and a[i] are presumably all distinct. Finally, expressions like
    	*p++ = *q++
    	a[i++] = b[j++]
    in which three things are modified (p, q, and *p in the first expression, and i, j, and a[i] in the second), are allowed if all three objects are distinct, i.e. only if two different pointers p and q or two different array indices i and j are used.
  4. You may also break rule 1 or 2 as long as you interpose a defined sequence point operator between the two modifications, or between the modification and the access. The expression
    	(c = getchar()) != EOF && c != '\n'
    (commonly seen in a while loop while reading a line) is legal because the second access of the variable c occurs after the sequence point implied by &&. (Without the sequence point, the expression would be illegal because the access of c while comparing it to '\n' on the right does not ``determine the value to be stored'' on the left.)

    comp.lang.c FAQ list · Question 3.12a

    Q: What's the difference between ++i and i++?

    A: If your C book doesn't explain, get a better one. Briefly: ++i adds one to the stored value of i and ``returns'' the new, incremented value to the surrounding expression; i++ adds one to i but returns the prior, unincremented value.

    comp.lang.c FAQ list · Question 3.12b

    Q: If I'm not using the value of the expression, should I use ++i or i++ to increment a variable?

    A: Since the two forms differ only in the value yielded, they are entirely equivalent when only their side effect is needed. (However, the prefix form is preferred in C++.) Some people will tell you that in the old days one form was preferred over the other because it utilized a PDP-11 autoincrement addressing mode, but those people are confused. An autoincrement addressing mode can only help if a pointer variable is being incremented and indirected upon, as in

    	register char c, *cp;
    	c = *cp++;
    See also question 3.3.

    References: K&R1 Sec. 2.8 p. 43
    K&R2 Sec. 2.8 p. 47
    ISO Sec., Sec.
    H&S Sec. 7.4.4 pp. 192-3, Sec. 7.5.8 pp. 199-200

    comp.lang.c FAQ list · Question 3.13

    Q: I need to check whether one number lies between two others. Why doesn't

    if(a < b < c)

    A: The relational operators, such as <, are all binary; they compare two operands and return a true or false (1 or 0) result. Therefore, the expression a < b < c compares a to b, and then checks whether the resulting 1 or 0 is less than c. (To see it more clearly, imagine that it had been written as (a < b) < c, because that's how the compiler interprets it.) To check whether one number lies between two others, use code like this:

    	if(a < b && b < c)

    References: K&R1 Sec. 2.6 p. 38
    K&R2 Sec. 2.6 pp. 41-2
    ISO Sec. 6.3.8, Sec. 6.3.9
    H&S Secs. 7.6.4,7.6.5 pp. 207-210

    comp.lang.c FAQ list · Question 3.14

    Q: Why doesn't the code

    int a = 1000, b = 1000;
    long int c = a * b;

    A: Under C's integral promotion rules, the multiplication is carried out using int arithmetic, and the result may overflow or be truncated before being promoted and assigned to the long int left-hand side. Use an explicit cast on at least one of the operands to force long arithmetic:

    	long int c = (long int)a * b;
    or perhaps
    	long int c = (long int)a * (long int)b;
    (both forms are equivalent).

    Notice that the expression (long int)(a * b) would not have the desired effect. An explicit cast of this form (i.e. applied to the result of the multiplication) is equivalent to the implicit conversion which would occur anyway when the value is assigned to the long int left-hand side, and like the implicit conversion, it happens too late, after the damage has been done.

    See also question 3.15.

    References: K&R1 Sec. 2.7 p. 41
    K&R2 Sec. 2.7 p. 44
    ISO Sec.
    H&S Sec. 6.3.4 p. 176
    CT&P Sec. 3.9 pp. 49-50

    comp.lang.c FAQ list · Question 3.14b

    Q: How can I ensure that integer arithmetic doesn't overflow?

    A: See question 20.6b.

    comp.lang.c FAQ list · Question 3.15

    Q: Why does the code

    double degC, degF;
    degC = 5 / 9 * (degF - 32);
    keep giving me 0?

    A: If both operands of a binary operator are integers, C performs an integer operation, regardless of the type of the rest of the expression. In this case, the integer operation is truncating division, yielding 5 / 9 = 0. (Note, though, that the problem of having subexpressions evaluated in an unexpected type is not restricted to division, nor for that matter to type int.) If you cast one of the operands to float or double, or use a floating-point constant, i.e.

    	degC = (double)5 / 9 * (degF - 32);
    	degC = 5.0 / 9 * (degF - 32);
    it will work as you expect. Note that the cast must be on one of the operands; casting the result (as in (double)(5 / 9) * (degF - 32)) would not help.

    See also question 3.14.

    References: K&R1 Sec. 1.2 p. 10, Sec. 2.7 p. 41
    K&R2 Sec. 1.2 p. 10, Sec. 2.7 p. 44
    ISO Sec.
    H&S Sec. 6.3.4 p. 176

    comp.lang.c FAQ list · Question 3.16

    Q: I have a complicated expression which I have to assign to one of two variables, depending on a condition. Can I use code like this?

    	((condition) ? a : b) = complicated_expression;

    A: No. The ?: operator, like most operators, yields a value, and you can't assign to a value. (In other words, ?: does not yield an lvalue.) If you really want to, you can try something like

    	*((condition) ? &a : &b) = complicated_expression;
    although this is admittedly not as pretty.

    References: ISO Sec. 6.3.15
    H&S Sec. 7.1 pp. 179-180

    comp.lang.c FAQ list · Question 3.17

    Q: I have some code containing expressions like

    a ? b = c : d
    and some compilers are accepting it but some are not.

    A: In the original definition of the language, = was of lower precedence than ?:, so early compilers tended to trip up on an expression like the one above, attempting to parse it as if it had been written

    	(a ? b) = (c : d)
    Since it has no other sensible meaning, however, later compilers have allowed the expression, and interpret it as if an inner set of parentheses were implied:
    	a ? (b = c) : d
    Here, the left-hand operand of the = is simply b, not the invalid a ? b. In fact, the grammar specified in the ANSI/ISO C Standard effectively requires this interpretation. (The grammar in the Standard is not precedence-based, and says that any expression may appear between the ? and : symbols.)

    An expression like the one in the question is perfectly acceptable to an ANSI compiler, but if you ever have to compile it under an older compiler, you can always add the explicit, inner parentheses.

    References: K&R1 Sec. 2.12 p. 49
    ISO Sec. 6.3.15
    Rationale Sec. 3.3.15

    comp.lang.c FAQ list · Question 3.18

    Q: What does the warning ``semantics of `>' change in ANSI C'' mean?

    A: This message represents an attempt by certain (perhaps overzealous) compilers to warn you that some code may perform differently under the ANSI C ``value preserving'' rules than under the older ``unsigned preserving'' rules.

    The wording of this message is rather confusing because what has changed is not really the semantics of the > operator itself (in fact, almost any C operator can appear in the message), but rather the semantics of the implicit conversions which always occur when two dissimilar types meet across a binary operator, or when a narrow integral type must be promoted.

    (If you didn't think you were using any unsigned values in your expression, the most likely culprit is strlen. In Standard C, strlen returns size_t, which is an unsigned type.)

    See question 3.19.

    comp.lang.c FAQ list · Question 3.19

    Q: What's the difference between the ``unsigned preserving'' and ``value preserving'' rules?

    A: These rules concern the behavior when an unsigned type must be promoted to a ``larger'' type. Should it be promoted to a larger signed or unsigned type? (To foreshadow the answer, it may depend on whether the larger type is truly larger.)

    Under the unsigned preserving (also called ``sign preserving'') rules, the promoted type is always unsigned. This rule has the virtue of simplicity, but it can lead to surprises (see the first example below).

    Under the value preserving rules, the conversion depends on the actual sizes of the original and promoted types. If the promoted type is truly larger--which means that it can represent all the values of the original, unsigned type as signed values--then the promoted type is signed. If the two types are actually the same size, then the promoted type is unsigned (as for the unsigned preserving rules).

    Since the actual sizes of the types are used in making the determination, the results will vary from machine to machine. On some machines, short int is smaller than int, but on some machines, they're the same size. On some machines, int is smaller than long int, but on some machines, they're the same size.

    In practice, the difference between the unsigned and value preserving rules matters most often when one operand of a binary operator is (or promotes to) int and the other one might, depending on the promotion rules, be either int or unsigned int. If one operand is unsigned int, the other will be converted to that type--almost certainly causing an undesired result if its value was negative (again, see the first example below). When the ANSI C Standard was established, the value preserving rules were chosen, to reduce the number of cases where these surprising results occur. (On the other hand, the value preserving rules also reduce the number of predictable cases, because portable programs cannot depend on a machine's type sizes and hence cannot know which way the value preserving rules will fall.)

    Here is a contrived example showing the sort of surprise that can occur under the unsigned preserving rules:

    	unsigned short us = 10;
    	int i = -5;
    	if(i > us)
    The important issue is how the expression i > us is evaluated. Under the unsigned preserving rules (and under the value preserving rules on a machine where short integers and plain integers are the same size), us is promoted to unsigned int. The usual integral conversions say that when types unsigned int and int meet across a binary operator, both operands are converted to unsigned, so i is converted to unsigned int, as well. The old value of i, -5, is converted to some large unsigned value (65,531 on a 16-bit machine). This converted value is greater than 10, so the code prints ``whoops!''

    Under the value preserving rules, on a machine where plain integers are larger than short integers, us is converted to a plain int (and retains its value, 10), and i remains a plain int. The expression is not true, and the code prints nothing. (To see why the values can be preserved only when the signed type is larger, remember that a value like 40,000 can be represented as an unsigned 16-bit integer but not as a signed one.)

    Unfortunately, the value preserving rules do not prevent all surprises. The example just presented still prints ``whoops'' on a machine where short and plain integers are the same size. The value preserving rules may also inject a few surprises of their own--consider the code:

    	unsigned char uc = 0x80;
    	unsigned long ul = 0;
    	ul |= uc << 8;
    	printf("0x%lx\n", ul);
    Before being left-shifted, uc is promoted. Under the unsigned preserving rules, it is promoted to an unsigned int, and the code goes on to print 0x8000, as expected. Under the value preserving rules, however, uc is promoted to a signed int (as long as int's are larger than char's, which is usually the case). The intermediate result uc << 8 goes on to meet ul, which is unsigned long. The signed, intermediate result must therefore be promoted as well, and if int is smaller than long, the intermediate result is sign-extended, becoming 0xffff8000 on a machine with 32-bit longs. On such a machine, the code prints 0xffff8000, which is probably not what was expected. (On machines where int and long are the same size, the code prints 0x8000 under either set of rules.)

    To avoid surprises (under either set of rules, or due to an unexpected change of rules), it's best to avoid mixing signed and unsigned types in the same expression, although as the second example shows, this rule is not always sufficient. You can always use explicit casts to indicate, unambiguously, exactly where and how you want conversions performed; see questions 12.42 and 16.7 for examples. (Some compilers attempt to warn you when they detect ambiguous cases or expressions which would have behaved differently under the unsigned preserving rules, although sometimes these warnings fire too often; see also question 3.18.)

    References: K&R2 Sec. 2.7 p. 44, Sec. A6.5 p. 198, Appendix C p. 260
    ISO Sec., Sec., Sec.
    Rationale Sec.
    H&S Secs. 6.3.3,6.3.4 pp. 174-177

    Read sequentially: prev next up

    about this FAQ list   about eskimo   search   feedback   copyright

    Hosted by Eskimo North