2. Structures, Unions, and Enumerations

comp.lang.c FAQ list · Question 2.1

Q: What's the difference between these two declarations?

	struct x1 { ... };
	typedef struct { ... } x2;


A: The first form declares a structure tag; the second declares a typedef. The main difference is that the second declaration is of a slightly more abstract type--its users don't necessarily know that it is a structure, and the keyword struct is not used when declaring instances of it:

	x2 b;

Structures declared with tags, on the other hand, must be defined with the

	struct x1 a;
form. [footnote]

(It's also possible to play it both ways:

	typedef struct x3 { ... } x3;
It's legal, if potentially obscure, to use the same name for both the tag and the typedef, since they live in separate namespaces. See question 1.29.)


comp.lang.c FAQ list · Question 2.2

Q: Why doesn't

struct x { ... };
x thestruct;
work?


A: C is not C++. Typedef names are not automatically generated for structure tags. Either declare structure instances using the struct keyword:

	struct x thestruct;
or declare a typedef when you declare a structure:
	typedef struct { ... } x;

	x thestruct;
See also questions 1.14 and 2.1.


comp.lang.c FAQ list · Question 2.3

Q: Can a structure contain a pointer to itself?


A: Most certainly. A problem can arise if you try to use typedefs; see questions 1.14 and 1.15.




comp.lang.c FAQ list · Question 2.4

Q: How can I implement opaque (abstract) data types in C?


A: One good way is for clients to use structure pointers (perhaps additionally hidden behind typedefs) which point to structure types which are not publicly defined. In other words, a client uses structure pointers (and calls functions accepting and returning structure pointers) without knowing anything about what the fields of the structure are. (As long as the details of the structure aren't needed--e.g. as long as the -> and sizeof operators are not used--C is perfectly happy to handle pointers to structures of incomplete type.[footnote] ) Only within the source files implementing the abstract data type are complete declarations for the structures actually in scope.

See also question 11.5.

Additional links:

example

slightly shorter, slightly goofier example




comp.lang.c FAQ list · Question 2.4b

Q: Is there a good way of simulating OOP-style inheritance, or other OOP features, in C?


A: It's straightforward to implement simple ``methods'' by placing function pointers in structures. You can make various clumsy, brute-force attempts at inheritance using the preprocessor or by having structures contain ``base types'' as initial subsets, but it won't be perfect. There's obviously no operator overloading, and overriding (i.e. of ``methods'' in ``derived classes'') would have to be done by hand.

Obviously, if you need ``real'' OOP, you'll want to use a language that supports it, such as C++.

Additional links: An article by James Hu exploring some possibilities in more detail.




comp.lang.c FAQ list · Question 2.5

Q: Why does the declaration

extern int f(struct x *p);
give me an obscure warning message about ``struct x declared inside parameter list''?


A: See question 11.5.




comp.lang.c FAQ list · Question 2.6

Q: I came across some code that declared a structure like this:

struct name {
	int namelen;
	char namestr[1];
};
and then did some tricky allocation to make the namestr array act like it had several elements, with the number recorded by namelen. How does this work? Is it legal or portable?


A: It's not clear if it's legal or portable, but it is rather popular. An implementation of the technique might look something like this:

#include <stdlib.h>
#include <string.h>

struct name *makename(char *newname)
{
	struct name *ret =
		malloc(sizeof(struct name)-1 + strlen(newname)+1);
				/* -1 for initial [1]; +1 for \0 */
	if(ret != NULL) {
		ret->namelen = strlen(newname);
		strcpy(ret->namestr, newname);
	}

	return ret;
}
This function allocates an instance of the name structure with the size adjusted so that the namestr field can hold the requested name (not just one character, as the structure declaration would suggest).

Despite its popularity, the technique is also somewhat notorious: Dennis Ritchie has called it ``unwarranted chumminess with the C implementation,'' and an official interpretation has deemed that it is not strictly conforming with the C Standard, although it does seem to work under all known implementations. (Compilers which check array bounds carefully might issue warnings.)

Another possibility is to declare the variable-size element very large, rather than very small. The above example could be rewritten like this:

#include <stdlib.h>
#include <string.h>

#define MAXSIZE 100

struct name {
	int namelen;
	char namestr[MAXSIZE];
};

struct name *makename(char *newname)
{
	struct name *ret =
		malloc(sizeof(struct name)-MAXSIZE+strlen(newname)+1);
								/* +1 for \0 */
	if(ret != NULL) {
		ret->namelen = strlen(newname);
		strcpy(ret->namestr, newname);
	}

	return ret;
}
where MAXSIZE is larger than any name which will be stored. However, it looks like this technique is disallowed by a strict interpretation of the Standard as well. Furthermore, either of these ``chummy'' structures must be used with care, since the programmer knows more about their size than the compiler does.

Of course, to be truly safe, the right thing to do is use a character pointer instead of an array:

#include <stdlib.h>
#include <string.h>

struct name {
	int namelen;
	char *namep;
};

struct name *makename(char *newname)
{
	struct name *ret = malloc(sizeof(struct name));
	if(ret != NULL) {
		ret->namelen = strlen(newname);
		ret->namep = malloc(ret->namelen + 1);
		if(ret->namep == NULL) {
			free(ret);
			return NULL;
		}
		strcpy(ret->namep, newname);
	}

	return ret;
}
(Obviously, the ``convenience'' of having the length and the string stored in the same block of memory has now been lost, and freeing instances of this structure will require two calls to free; see question 7.23.)

When the data type being stored is characters, as in the above examples, it is straightforward to coalesce the two calls to malloc into one, to preserve contiguity (and therefore rescue the ability to use a single call to free):

struct name *makename(char *newname)
{
	char *buf = malloc(sizeof(struct name) +
				strlen(newname) + 1);
	struct name *ret = (struct name *)buf;
	ret->namelen = strlen(newname);
	ret->namep = buf + sizeof(struct name);
	strcpy(ret->namep, newname);

	return ret;
}

However, piggybacking a second region onto a single malloc call like this is only portable if the second region is to be treated as an array of char. For any larger type, alignment (see questions 2.12 and 16.7) becomes significant and would have to be preserved.

C99 introduces the concept of a flexible array member, which allows the size of an array to be omitted if it is the last member in a structure, thus providing a well-defined solution.

References: Rationale Sec. 3.5.4.2
C9X Sec. 6.5.2.1




comp.lang.c FAQ list · Question 2.7

Q: I heard that structures could be assigned to variables and passed to and from functions, but K&R1 says not.


A: What K&R1 said (though this was quite some time ago by now) was that the restrictions on structure operations would be lifted in a forthcoming version of the compiler, and in fact the operations of assigning structures, passing structures as function arguments, and returning structures from functions were fully functional in Ritchie's compiler even as K&R1 was being published. A few ancient compilers may have lacked these operations, but all modern compilers support them, and they are part of the ANSI C standard, so there should be no reluctance to use them. [footnote]

(Note that when a structure is assigned, passed, or returned, the copying is done monolithically. This means that the copies of any pointer fields will point to the same place as the original. In other words, the data pointed to is not copied.)

See the code fragments in question 14.11 for an example of structure operations in action.

References: K&R1 Sec. 6.2 p. 121
K&R2 Sec. 6.2 p. 129
ISO Sec. 6.1.2.5, Sec. 6.2.2.1, Sec. 6.3.16
H&S Sec. 5.6.2 p. 133




comp.lang.c FAQ list · Question 2.8

Q: Is there a way to compare structures automatically?


A: No. There is not a good way for a compiler to implement structure comparison (i.e. to support the == operator for structures) which is consistent with C's low-level flavor. A simple byte-by-byte comparison could founder on random bits present in unused ``holes'' in the structure (such padding is used to keep the alignment of later fields correct; see question 2.12). A field-by-field comparison might require unacceptable amounts of repetitive code for large structures. Any compiler-generated comparison could not be expected to compare pointer fields appropriately in all cases: for example, it's often appropriate to compare char * fields with strcmp rather than == (see also question 8.2).

If you need to compare two structures, you'll have to write your own function to do so, field by field.

References: K&R2 Sec. 6.2 p. 129
Rationale Sec. 3.3.9
H&S Sec. 5.6.2 p. 133




comp.lang.c FAQ list · Question 2.9

Q: How are structure passing and returning implemented?


A: When structures are passed as arguments to functions, the entire structure is typically pushed on the stack, using as many words as are required. (Programmers often choose to use pointers to structures instead, precisely to avoid this overhead.) Some compilers merely pass a pointer to the structure, though they may have to make a local copy to preserve pass-by-value semantics.

Structures are often returned from functions in a location pointed to by an extra, compiler-supplied ``hidden'' argument to the function. Some older compilers used a special, static location for structure returns, although this made structure-valued functions non-reentrant, which ANSI C disallows.

References: ISO Sec. 5.2.3




comp.lang.c FAQ list · Question 2.10

Q: How can I pass constant values to functions which accept structure arguments? How can I create nameless, immediate, constant structure values?


A: Traditional C had no way of generating anonymous structure values; you had to use a temporary structure variable or a little structure-building function; see question 14.11 for an example.

C99 introduces ``compound literals'', one form of which provides for structure constants. For example, to pass a constant coordinate pair to a hypothetical plotpoint function which expects a struct point, you can call

	plotpoint((struct point){1, 2});
Combined with ``designated initializers'' (another C99 feature), it is also possible to specify member values by name:
	plotpoint((struct point){.x=1, .y=2});

See also question 4.10.

References: C9X Sec. 6.3.2.5, Sec. 6.5.8




comp.lang.c FAQ list · Question 2.11

Q: How can I read/write structures from/to data files?


A: It is relatively straightforward to write a structure out using fwrite:

	fwrite(&somestruct, sizeof somestruct, 1, fp);
and a corresponding fread invocation can read it back in. What happens here is that fwrite receives a pointer to the structure, and writes (or fread correspondingly reads) the memory image of the structure as a stream of bytes. The sizeof operator determines how many bytes the structure occupies.

(The call to fwrite above is correct under an ANSI compiler as long as a prototype for fwrite is in scope, usually because <stdio.h> is #included.

However, data files written as memory images in this way will not be portable, particularly if they contain floating-point fields or pointers. The memory layout of structures is machine and compiler dependent. Different compilers may use different amounts of padding (see question 2.12), and the sizes and byte orders of fundamental types vary across machines. Therefore, structures written as memory images cannot necessarily be read back in by programs running on other machines (or even compiled by other compilers), and this is an important concern if the data files you're writing will ever be interchanged between machines. See also questions 2.12 and 20.5.

Also, if the structure contains any pointers (char * strings, or pointers to other data structures), only the pointer values will be written, and they are most unlikely to be valid when read back in. Finally, note that for widespread portability you must use the "b" flag when opening the files; see question 12.38.

A more portable solution, though it's a bit more work initially, is to write a pair of functions for writing and reading a structure, field-by-field, in a portable (perhaps even human-readable) way.

References: H&S Sec. 15.13 p. 381




comp.lang.c FAQ list · Question 2.12

Q: Why is my compiler leaving holes in structures, wasting space and preventing ``binary'' I/O to external data files? Can I turn this off, or otherwise control the alignment of structure fields?


A: Many machines access values in memory most efficiently when the values are appropriately aligned. (For example, on a byte-addressed machine, short ints of size 2 might best be placed at even addresses, and long ints of size 4 at addresses which are a multiple of 4.) Some machines cannot perform unaligned accesses at all, and require that all data be appropriately aligned.

Therefore, if you declare a structure like

	struct {
		char c;
		int i;
	};
the compiler will usually leave an unnamed, unused hole between the char and int fields, to ensure that the int field is properly aligned. (This incremental alignment of the second field based on the first relies on the fact that the structure itself is always properly aligned, with the most conservative alignment requirement. The compiler guarantees this alignment for structures it allocates, as does malloc.)

Your compiler may provide an extension to give you control over the packing of structures (i.e. whether they are padded or not), perhaps with a #pragma (see question 11.20), but there is no standard method.

If you're worried about wasted space, you can minimize the effects of padding by ordering the members of a structure based on their base types, from largest to smallest. You can sometimes get more control over size and alignment by using bit-fields, although they have their own drawbacks. (See question 2.26.)

See also questions 2.13, 16.7, and 20.5.

Additional links:

A bit more explanation of ``alignment'' and why it requires paddding

Additional ideas on working with alignment and padding by Eric Raymond, couched in the form of six new FAQ list questions

Corrections to the above from Norm Diamond and Clive Feather

References: K&R2 Sec. 6.4 p. 138
H&S Sec. 5.6.4 p. 135




comp.lang.c FAQ list · Question 2.13

Q: Why does sizeof report a larger size than I expect for a structure type, as if there were padding at the end?


A: Padding at the end of a structure may be necessary to preserve alignment when an array of contiguous structures is allocated. Even when the structure is not part of an array, the padding remains, so that sizeof can always return a consistent size. See also question 2.12.

References: H&S Sec. 5.6.7 pp. 139-40




comp.lang.c FAQ list · Question 2.14

Q: How can I determine the byte offset of a field within a structure?


A: ANSI C defines the offsetof() macro in <stddef.h>, which lets you compute the offset of field f in struct s as offsetof(struct s, f). If for some reason you have to code this sort of thing yourself, one possibility is

	#define offsetof(type, f) ((size_t) \
		((char *)&((type *)0)->f - (char *)(type *)0))

This implementation is not 100% portable; some compilers may legitimately refuse to accept it.

(The complexities of the definition above bear a bit of explanation. The subtraction of a carefully converted null pointer is supposed to guarantee that a simple offset is computed even if the internal representation of the null pointer is not 0. The casts to (char *) arrange that the offset so computed is a byte offset. The nonportability is in pretending, if only for the purposes of address calculation, that there is an instance of the type sitting at address 0. Note, however, that since the pretend instance is not actually referenced, an access violation is unlikely.)

References: ISO Sec. 7.1.6
Rationale Sec. 3.5.4.2
H&S Sec. 11.1 pp. 292-3




comp.lang.c FAQ list · Question 2.15

Q: How can I access structure fields by name at run time?


A: Keep track of the field offsets as computed using the offsetof() macro (see question 2.14). If structp is a pointer to an instance of the structure, and field f is an int having offset offsetf, f's value can be set indirectly with

	*(int *)((char *)structp + offsetf) = value;



comp.lang.c FAQ list · Question 2.16

Q: Does C have an equivalent to Pascal's with statement?


A: See question 20.23.




comp.lang.c FAQ list · Question 2.17

Q: If an array name acts like a pointer to the base of an array, why isn't the same thing true of a structure?


A: The rule (see question 6.3) that causes array references to ``decay'' into pointers is a special case which applies only to arrays, and reflects their ``second class'' status in C. (An analogous rule applies to functions.) Structures, however, are first class objects: when you mention a structure, you get the entire structure.




comp.lang.c FAQ list · Question 2.18

Q: This program works correctly, but it dumps core after it finishes. Why?

	struct list {
		char *item;
		struct list *next;
	}

	/* Here is the main program. */

	main(argc, argv)
	{ ... }


A: A missing semicolon at the end of the structure declaration causes main to be declared as returning a structure. (The connection is hard to see because of the intervening comment.) Since structure-valued functions are usually implemented by adding a hidden return pointer (see question 2.9), the generated code for main() tries to accept three arguments, although only two are passed (in this case, by the C start-up code). See also questions 10.9 and 16.4.

References: CT&P Sec. 2.3 pp. 21-2




comp.lang.c FAQ list · Question 2.19

Q: What's the difference between a structure and a union, anyway?


A: A union is essentially a structure in which all of the fields overlay each other; you can only use one field at a time. (You can also cheat by writing to one field and reading from another, to inspect a type's bit patterns or interpret them differently, but that's obviously pretty machine-dependent.) The size of a union is the maximum of the sizes of its individual members, while the size of a structure is the sum of the sizes of its members. (In both cases, the size may be increased by padding; see questions 2.12 and 2.13.)

References: ISO Sec. 6.5.2.1
H&S Sec. 5.7 pp. 140-145 esp. Sec. 5.7.4




comp.lang.c FAQ list · Question 2.20

Q: Can I initialize unions?


A: In the original ANSI C, an initializer was allowed only for the first-named member of a union. C99 introduces ``designated initializers'' which can be used to initialize any member.

In the absence of designated initializers, if you're desperate, you can sometimes define several variant copies of a union, with the members in different orders, so that you can declare and initialize the one having the appropriate first member. (These variants are guaranteed to be implemented compatibly, so it's okay to ``pun'' them by initializing one and then using the other.)

References: K&R2 Sec. 6.8 pp. 148-9
ISO Sec. 6.5.7
C9X Sec. 6.5.8
H&S Sec. 4.6.7 p. 100




comp.lang.c FAQ list · Question 2.21

Q: Is there an automatic way to keep track of which field of a union is in use?


A: No. You can implement an explicitly ``tagged'' union yourself:

struct taggedunion {
	enum {UNKNOWN, INT, LONG, DOUBLE, POINTER} code;
	union {
		int i;
		long l;
		double d;
		void *p;
	} u;
};
You will have to make sure that the code field is always set appropriately when the union is written to; the compiler won't do any of this for you automatically. (C unions are not like Pascal variant records.)

References: H&S Sec. 5.7.3 p. 143




comp.lang.c FAQ list · Question 2.22

Q: What's the difference between an enumeration and a set of preprocessor #defines?


A: There is little difference. The C Standard says that enumerations have integral type and that enumeration constants are of type int, so both may be freely intermixed with other integral types, without errors. (If, on the other hand, such intermixing were disallowed without explicit casts, judicious use of enumerations could catch certain programming errors.)

Some advantages of enumerations are that the numeric values are automatically assigned, that a debugger may be able to display the symbolic values when enumeration variables are examined, and that they obey block scope. (A compiler may also generate nonfatal warnings when enumerations are indiscriminately mixed, since doing so can still be considered bad style even though it is not strictly illegal.) A disadvantage is that the programmer has little control over those nonfatal warnings; some programmers also resent not having control over the sizes of enumeration variables.

References: K&R2 Sec. 2.3 p. 39, Sec. A4.2 p. 196
ISO Sec. 6.1.2.5, Sec. 6.5.2, Sec. 6.5.2.2, Annex F
H&S Sec. 5.5 pp. 127-9, Sec. 5.11.2 p. 153




comp.lang.c FAQ list · Question 2.23

Q: Are enumerations really portable?
Aren't they Pascalish?


A: Enumerations were a mildly late addition to the language (they were not in K&R1), but they are definitely part of the language now: they're in the C Standard, and all modern compilers support them. They're quite portable, although historical uncertainty about their precise definition led to their specification in the Standard being rather weak (see question 2.22).




comp.lang.c FAQ list · Question 2.24

Q: Is there an easy way to print enumeration values symbolically?


A: No. You can write a little function (one per enumeration) to map an enumeration constant to a string, either by using a switch statement or by searching an array. (For debugging purposes, a good debugger should automatically print enumeration constants symbolically.)




comp.lang.c FAQ list · Question 2.25

Q: I came across some structure declarations with colons and numbers next to certain fields, like this:

struct record {
	char *name;
	int refcount : 4;
	unsigned dirty : 1;
};
What gives?


A: Those are bit-fields; the number gives the exact size of the field, in bits. (See any complete book on C for the details.) Bit-fields can be used to save space in structures having several binary flags or other small fields, and they can also be used in an attempt to conform to externally-imposed storage layouts. (Their success at the latter task is mitigated by the fact that bit-fields are assigned left-to-right on some machines and right-to-left on others).

Note that the colon notation for specifying the size of a field in bits is only valid in structures (and in unions); you cannot use this mechanism to specify the size of arbitrary variables. (See questions 1.2 and 1.3.)

References: K&R1 Sec. 6.7 pp. 136-8
K&R2 Sec. 6.9 pp. 149-50
ISO Sec. 6.5.2.1
H&S Sec. 5.6.5 pp. 136-8




comp.lang.c FAQ list · Question 2.26

Q: Why do people use explicit masks and bit-twiddling code so much, instead of declaring bit-fields?


A: Bit-fields are thought to be nonportable, although they are no less portable than other parts of the language. (You don't know how big they can be, but that's equally true for values of type int. You don't know by default whether they're signed, but that's equally true of type char. You don't know whether they're laid out from left to right or right to left in memory, but that's equally true of the bytes of all types, and only matters if you're trying to conform to externally-imposed storage layouts, which is always nonportable; see also questions 2.12 and 20.5.)

Bit-fields are inconvenient when you also want to be able to manipulate some collection of bits as a whole (perhaps to copy a set of flags). You can't have arrays of bit-fields; see also question 20.8. Many programmers suspect that the compiler won't generate good code for bit-fields (historically, this was sometimes true).

Straightforward code using bit-fields is certainly clearer than the equivalent explicit masking instructions; it's too bad that bit-fields can't be used more often.





Read sequentially: prev next up



about this FAQ list   about eskimo   search   feedback   copyright

Hosted by Eskimo North