From: Chris Torek
Subject: Re: fflush vs gets
Date: 2000/01/20
Message-ID: <clcm-20000120-0031@plethora.net>

Peter S. Shenkin wrote:
>I know I'm being dense here, but could someone explain to me
>what problem this person is trying to solve? I've read this
>and the FAQ section posted by Dann Corbitt, and I still don't
>get it.
>
>Why would you possibly want to discard the user's input, and
>how in the world would you know what part to discard?

The short answers are: “You don't, and you don't.”

What is going on here is a kludge piled on top of an already “bad” (in some sense) program. Instead of fixing the actual problem, certain books that purport to teach C suggest fixing the symptoms.

This is a bit like going in to the doctor with incipient pneumonia and being given a cough suppressant.

The root of the problem is the use of scanf(). The scanf() function is a large and complex beast that often does something almost but not quite entirely unlike what you desired.

The entire scanf() family works by interpreting “directives”. These directives are not well suited to interactive (“talking to a human”) input. In particular, a directive like “ ” or “\n” means: “skip as much input white space as you can find, INCLUDING NEWLINES.” Most conversion directives (including %d and %f) have an implicit skip as well. This means that if you print a prompt:

	printf("please enter a number: ");
	fflush(stdout);

and then ask for input using `scanf("%d", &var)', and the human enters a blank line, the computer simply sits there, rather than re-prompting.

The next problem is that the scanf() family tend to leave unconsumed input. Anything that does not “meet the requirements” is left in place. If the user, who is supposed to enter a number, enters “three” instead of “3”, the “t” does not meet the requirements for “%d”. The entire line (“three\n”) is left in the input stream. If the user does enter a number, such as “3”, only the newline is left in the stream. If the user enters a number followed by a blank (or tab or other whitespace), the blank and newline are both left in the stream.

This characteristic (of leaving “extra” input behind as a trap to the unwary) leads people to write the “discard user's input” code. Unfortunately, they often use implementation-specific constructs like fflush(stdin), or broken ones like an unadorned getchar() or a scanf format like “%*[^\n]%*c”. To see why this last format is broken, read the next paragraphs.

Another substantial problem with the scanf() functions is that they interpret directives sequentially, and stop as soon as they get a “matching failure”. This seems often to surprise people. In particular, consider the format directive “%*[^\n]”. This consists of several parts. The “%” introduces the conversion. The asterisk “*” suppresses assignment of the result of the conversion, so that no additional buffer is required. The “[” specifies that the conversion is to do a character-class match, the initial “^” inverts the class, and the class itself consists only of a single character, “\n”. The “]” terminates the class and is the end of that directive. This directive thus means “match things that are not newlines”.

The tricky bit here is that any %[ directive MUST MATCH AT LEAST ONE CHARACTER. If it fails to match at least one character, the scan terminates. The call returns without looking at any further directives. Thus, if the next input character is a newline, this “%*[” directive fails, and the “%*c” NEVER OCCURS. The format sequence “%*[^\n]%*c” only clears out the rest of a line if there is at least one character before the newline.

This problem can be fixed (as others noted) by using two separate calls. An initial scanf() with “%*[^\n]” will either eat everything up to but not including a newline, or fail. A subsequent “%*c” (or plain old getchar()) will consume the newline, if there was one.

That last “if” matters too: perhaps the user signalled EOF. In this case, the getchar() or scanf("%*c") might -- this decision is left to the people who write your compiler -- either immediately return EOF, or go back to the user for more input. If the implementors choose the latter, the user might have to click on “end this thing” (^D, ^Z, mouse button, front panel switch, or whatever) one extra time. This is annoying, if nothing else.

(Incidentally, offhand, I think %[ and %c are the only two directives that do not immediately skip whitespace, including newlines.)

So what is the “right” answer? There are various ways to do this. You can write horrendously complicated code using getchar, ungetc, and scanf, carefully checking all the return values. You can call fgets() to read a complete line, then -- having “sandboxed” it, as it were, use sscanf() and not worry too much about bad input. You can call fgets() and use strtol() and other string-parsing functions. The simpler approaches all share one common characteristic, though: they first read a complete line (including the terminating newline), and only then try to pick it apart. That gives users time to mull over their answer, type something in, erase it, type something else, erase that, think a bit more, and then give Regis Philbin their “final answers” by pressing ENTER or RETURN. :-) You then get the whole thing at once, and can dissect it as needed.
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA, USA
http://claw.bsdi.com/torek/ (not always up) I report spam to abuse@.
--
comp.lang.c.moderated - moderation address: clcm@plethora.net