From: scs@eskimo.com (Steve Summit)
Newsgroups: comp.lang.c
Subject: Re: File size
Date: 21 Jun 1999 13:08:17 GMT
Message-ID: <7kldg1$liv$1@eskinews.eskimo.com>

In article <7khuqi$a72$1@nnrp1.deja.com>, Syco writes:
> Yes, I have read that portion of the FAQ...
> It doesn't say how to find the file size, at least, not in a portable
> manner, which it needs to be because I'm programming this on a Windows
> machine but I think I'm going to have to compile it on a SCO Unix
> machine.

If there were a portable way of determining file sizes, believe you me the FAQ list would mention it.

The only portable way of determining a file's ``size'' is to open it, read it, and count the characters, but this (a) may not give you the answer you wanted, and (b) is likely to be unacceptably, pointlessly inefficient.

If you truly need a file's size in advance, go ahead and use something OS-specific (and, hence, not portable). Under Unix-like and Posix-compatible systems, definitely use stat() (or maybe fstat). stat() has existed under MS-DOS compilers I've used, too. (I can't help you with Windows, however.)

If the reason you need a file's size in advance is so that you can malloc a buffer big enough for reading in the file all at once, however, you have two other options. One is to process the file sequentially, rather than reading it in all at once; reading it in all at once isn't always the best idea, after all. The other option is that if you can guess the file's size, or get an approximation to it somehow (for example, stat() under MS-DOS gives the ``wrong'' size due to CRLF <=> \n translations), you can fairly easily write some code that malloc's the approximate amount, starts reading the file, and realloc's if necessary. If you'd like to see this worked out in detail, see section 11.3 of my C Programming Notes at http://www.eskimo.com/~scs/cclass/cclass.html, and the answers to introductory assignment 6. (The examples there show how to read a file line-by-line, not as a contiguous block, but they should give you some ideas.)

Once in a very long while you may have to write code to read a file twice (once to count the characters in it, and a second to do something with them, once you know precisely how many of them there are), but this should be an absolute last resort, because it's likely to lead to some pretty severe performance problems, at least for large files.

Steve Summit
scs@eskimo.com