From: Steve Summit
Subject: Re: fflush vs gets
Date: 2000/02/12
Message-ID: <clcm-20000211-0019@plethora.net>
Newsgroups: comp.lang.c,comp.lang.c.moderated

Peter S. Shenkin wrote:
> Why would you possibly want to discard the user's input,
> and how in the world would you know what part to discard?

It's likely that you've never tried to call scanf and gets in the same program. If you haven't, you're blissfully unaware of this messy little problem.

Suppose you write this trivial program:

	#include <stdio.h>
	int main()
	{
		int i;
		char string[80];
		printf("enter an integer:\n");
		scanf("%d", &i);
		printf("enter a string:\n");
		gets(string);
		printf("You typed %d and \"%s\"\n", i, string);
		return 0;
	}

Looks perfectly straightforward, right? But if you compile and run it (which I encourage you to do, if you're still unfamiliar with this problem), you'll see something weird, and you will find yourself (I guarantee it) asking question 12.18 in the comp.lang.c FAQ list: “I'm reading a number with scanf %d and then a string with gets(), but the compiler seems to be skipping the call to gets()!” (We'll have more to say later about using gets at all, but hold that thought.)

Let's look very carefully at what happens. The first printf call prints the first prompt, and we type “123” and hit the return key. The input stream now contains

	1 2 3 \n
Now we hit the scanf call, and scanf sees the format string %d indicating that we've asked it to read an int. It reads the character '1' from the input stream and says to itself “okay, that's a digit, so it can be part of an int.” It reads the characters '2' and '3', and they're digits, too. The next character is '\n', which is not a digit. So scanf does two things:

  1. It terminates processing of the %d directive; it now knows that the complete integer it has read is 123; it stores this value as requested into the location of the variable i.

  2. (This is the key point.) It pushes the \n character, which terminated the digit string but which it didn't otherwise use, back onto the input stream.

So after the first scanf call, the input stream contains

	\n
Now the second printf call prints the second prompt. Suppose we type “test” and hit the enter key. The input stream now contains
	\n t e s t \n
So now we come to the gets call. gets's job, of course, is to read one line of input, up to the next newline. But the very first character gets sees is a \n, so as far as it's concerned it's just read a blank line. It returns that blank line (an empty string, since gets always deletes the newline from the line before returning it to you), and the input stream is left containing
	t e s t \n
The third printf call prints the (somewhat surprising) results of the two inputs, and the program terminates, with the string “test” and the final newline unconsumed.

If this still isn't quite making sense, try running the program again, and typing something more than a number in response to the first prompt. That is, try typing “123 abc” or “123abc”, and then hitting the return key, when asked to “enter an integer”.

(Actually, the above description isn't quite right. No matter what you type on the first line, the input stream after the first scanf call still contains a \n, so the gets call reads it right away, without pausing for you to enter anything more. So you don't really get a chance to type “test” at all. To answer the FAQ list's question another way, the problem is not that the compiler somehow “skips” the gets call, the problem is that the gets call satisfies its need for input in an unexpected way, and skips the part about pausing the program to wait for you to type anything more.)

With the scenario above as background, we can now answer your question, “Why would you possibly want to discard the user's input?” For better or worse, many beginning programmers use scanf to read numbers and gets to read strings. This is in large part, of course, because these functions are taught early in many books and programming classes. And this, in turn, is because these functions are superficially and seductively attractive; they seem very easy and convenient to use. But they don't play at all well together (plus they have some other problems, which we'll get to).

When the beginning programmer writes a program like the above and discovers that it doesn't work quite right, he is likely to receive the handwaving explanation (from the instructor or the textbook author) that there is some “garbage left behind on the input stream by scanf”. (We, who understand the situation more accurately, now know that the “garbage” is, in the example we walked through, simply the \n that resulted from our hitting the return key after entering the requested number.) To allow further input to proceed as expected, these instructors and authors go on to explain, the “garbage” must be “discarded”. One all-too-popular (and, again, superficially attractive) way of doing this is to call fflush(stdin), despite the fact that this is a misguided application of the standard fflush function, an application that is not guaranteed to (and in fact most certainly does not) work everywhere. But it “works” under a large number of popular PC C compilers, so the “idiom” is, unfortunately, widespread.

What's the right solution? It's extremely easy to get stuck on the fact that fflush(stdin), for some presumably stupid and pedantic reason, is not guaranteed to work everywhere. One then starts casting about looking for a “portable” replacement. The problem is that, depending on precisely what one is trying to do, there are quite a few different tacks one might take in attempting to write some well-defined or portable code to “discard garbage from stdin”. (In the general case, as you correctly ask, “How in the world would [one] know what part to discard?”)

If the definition of the “garbage from stdin” that we're trying to discard is “input from the previous line which wasn't consumed by scanf”, it turns out that there are a couple of not entirely unreasonable approaches. We could write the loop

	while((c = getchar()) != '\n' && c != EOF)
		/* discard the character */;

to read and discard characters up to the next newline. (Notice that the comment “/* discard the character */” in this fragment does not stand in for some code I didn't write yet -- it stands in for some code I deliberately didn't write at all. The body of the loop is empty; we do nothing with the characters we're reading, thus discarding them. The \n which terminates the loop is discarded, too.)

Since that loop would clutter our code pretty badly if we had to interpose it after every scanf call, we might encapsulate it into a function we could call, perhaps called “flushline” or something. Or, recognizing that “read characters up to a newline” is precisely what the Standard functions gets and fgets already do, we might simply interpose calls to gets or fgets, reading into a dummy buffer which we ignore (and hence discard), perhaps accompanied by comments explaining that these dummy calls are to “get rid of the garbage left behind by scanf”. But these are still ugly, unclean, unsatisfying solutions. It won't be long before one of our scanf calls, for some reason, does consume a newline character after all, such that our compensating “read and discard characters up to the next newline” code will read and discard the next line of input, a real line of input, which will then be lost to the input-reading code which expected it. We could try to predict which scanf calls will and which scanf calls won't leave “garbage behind”, and sprinkle “flushline” calls after only those scanf calls which need them, but this is a hit-or-miss proposition, and a later reader will never be able to understand precisely what we're up to. There's got to be a better way.

The “better way”, as indicated in the FAQ list, is either to abandon scanf entirely, or to use it exclusively. If your input is intended to be line-based, you can read all lines of input as strings, using fgets or the like, and for those that were supposed to be numeric, convert the strings to numbers using functions like atoi, strtol, atof, or maybe even sscanf. (This is the general approach I recommend.) Or, since the problem is that it's scanf that does Not Play Well With Others, you can switch to a scheme where you use scanf for everything, using it to read your strings, too (with %s or the like).

Finally, I should add a couple of postscripts. It turns out that scanf has other problems besides the fact that it tends to leave little undigested “surprises” on the input stream, so there are other reasons to consider abandoning it. And, of course, as discussed at length in comp.lang.c of late, gets has a fatal problem which counterindicates its use for much of anything.

--

Steve Summit
scs@eskimo.com

Programming Challenge #5: Love your abstractions.
See http://www.eskimo.com/~scs/challenge/.