8. Characters and Strings

comp.lang.c FAQ list · Question 8.1

Q: Why doesn't

strcat(string, '!');

A: There is a very real difference between characters and strings, and strcat concatenates strings.

A character constant like '!' represents a single character. A string literal between double quotes usually represents multiple characters. A string literal like "!" seems to represent a single character, but it actually contains two: the ! you requested, and the \0 which terminates all strings in C.

Characters in C are represented by small integers corresponding to their character set values (see also question 8.6). Strings are represented by arrays of characters; you usually manipulate a pointer to the first character of the array. It is never correct to use one when the other is expected. To append a ! to a string, use

	strcat(string, "!");

See also questions 1.32, 7.2, and 16.6.

References: CT&P Sec. 1.5 pp. 9-10

comp.lang.c FAQ list · Question 8.2

Q: I'm checking a string to see if it matches a particular value. Why isn't this code working?

	char *string;
	if(string == "value") {
		/* string matches "value" */

A: Strings in C are represented as arrays of characters, and C never manipulates (assigns, compares, etc.) arrays as a whole.[footnote] The == operator in the code fragment above compares two pointers--the value of the pointer variable string and a pointer to the string literal "value"--to see if they are equal, that is, if they point to the same place. They probably don't, so the comparison never succeeds.

To compare two strings, you generally use the library function strcmp:

	if(strcmp(string, "value") == 0) {
		/* string matches "value" */

comp.lang.c FAQ list · Question 8.3

Q: If I can say

	char a[] = "Hello, world!";
why can't I say
	char a[14];
	a = "Hello, world!";

A: Strings are arrays, and you can't assign arrays directly. Use strcpy instead:

	strcpy(a, "Hello, world!");

See also questions 1.32, 4.2, 6.5, and 7.2.

comp.lang.c FAQ list · Question 8.4

Q: I can't get strcat to work. I tried

	char *s1 = "Hello, ";
	char *s2 = "world!";
	char *s3 = strcat(s1, s2);
but I got strange results.

A: See question 7.2.

comp.lang.c FAQ list · Question 8.5

Q: What is the difference between these initializations?

char a[] = "string literal";
char *p  = "string literal";
My program crashes if I try to assign a new value to p[i].

A: See question 1.32.

comp.lang.c FAQ list · Question 8.6

Q: How can I get the numeric value (i.e. ASCII or other character set code) corresponding to a character, or vice versa?

A: In C, characters are represented by small integers corresponding to their values in the machine's character set. Therefore, you don't need a conversion function: if you have the character, you have its value. The following fragment:

	int c1 = 'A', c2 = 65;
	printf("%c %d %c %d\n", c1, c1, c2, c2);
	A 65 A 65
on an ASCII machine.

To convert back and forth between the digit characters and the corresponding integers in the range 0-9, add or subtract the constant '0' (that is, the character value '0').

See also questions 8.9, 13.1, and 20.10.

comp.lang.c FAQ list · Question 8.7

Q: Does C have anything like the ``substr'' (extract substring) routine present in other languages?

A: See question 13.3.

comp.lang.c FAQ list · Question 8.8

Q: I'm reading strings typed by the user into an array, and then printing them out later. When the user types a sequence like \n, why isn't it being handled properly?

A: Character sequences like \n are interpreted at compile time. When a backslash and an adjacent n appear in a character constant or string literal, they are translated immediately into a single newline character. (Analogous translations occur, of course, for the other character escape sequences.) When you're reading strings from the user or a file, however, no interpretation like this is performed: a backslash is read and printed just like any other character, with no particular interpretation.

(Some interpretation of the newline character may be done during run-time I/O, but for a completely different reason; see question 12.40.)

See also question 12.6.

Additional links: further reading

comp.lang.c FAQ list · Question 8.9

Q: I think something's wrong with my compiler: I just noticed that sizeof('a') is 2, not 1 (i.e. not sizeof(char)).

A: Perhaps surprisingly, character constants in C are of type int, so sizeof('a') is sizeof(int) (though this is another area where C++ differs). See also question 7.8, and this footnote .

References: ISO Sec.
H&S Sec. 2.7.3 p. 29

comp.lang.c FAQ list · Question 8.10

Q: I'm starting to think about multinational character sets, and I'm worried about the implications of making sizeof(char) be 2 so that 16-bit character sets can be represented.

A: If type char were made 16 bits, sizeof(char) would still be 1, and CHAR_BIT in <limits.h> would be 16, and it would simply be impossible to declare (or allocate with malloc) a single 8-bit object.

Traditionally, a byte is not necessarily 8 bits, but merely a smallish region of memory, usually suitable for storing one character. The C Standard follows this usage, so the bytes used by malloc and sizeof can be more than 8 bits. [footnote] (The Standard does not allow them to be less.)

To allow manipulation of multinational character sets without requiring an expansion of type char, ANSI/ISO C defines the ``wide'' character type wchar_t, and corresponding wide string literals, and functions for manipulating and converting strings of wide characters.

See also question 7.8.

References: ISO Sec., Sec., Sec. 6.1.4, Sec. 7.1.6, Sec. 7.10.7, Sec. 7.10.8
Rationale Sec.
H&S Sec. 2.7.3 pp. 29-30, Sec. 2.7.4 p. 33, Sec. 11.1 p. 293, Secs. 11.7,11.8 pp. 303-310

Read sequentially: prev next up

about this FAQ list   about eskimo   search   feedback   copyright

Hosted by Eskimo North