Unfortunately there's no easy, standard way to read in all of an arbitrarily long line of text in C. You might think of using (f)scanf
or gets
, but both of these have their drawbacks.
For a start, gets
doesn't check the size of the buffer you give it, so it's easy to get a buffer overflow. For this reason never ever use gets
.
Similarly, naively using the %s
format with (f)scanf
can easily lead to buffer overflow. You can get around this using something like %20s
to tell it to read in only so many characters, but this unfortunately reads in exactly that many characters, so it's not very flexible.
There are a few solutions to this. Here are some of them, in decreasing order of convenience and (approximately) increasing order of portability:
%as
extension to (f)scanf
readline
librarygetline
functionfgets
and realloc
%as
extension to fscanf
Normally the %s
format causes fscanf
to read in an arbitrarily long string. As has been mentioned, this is insecure and should not be used. However, if you are using GNU libc, you can make use of the %as
format. This is exactly the same as %s
except that
char **
rather than char *
(since the function needs to change what the argument points to).So you might use something like:
#include <stdio.h> char * getline(FILE * f) { char * buf; int result = fscanf(f,"%as\\n",&buf); if (result < 0) { if (NULL != buf) free(buf); return NULL; } return buf; }
This is a very handy mechanism. The major disadvantage to it is that it is completely incompatible with ANSI and ISO standards (in particular, %a
means something different in C99). Unless you are sure you're using GNU libc and you don't mind breaking portability (usually not a good idea), you shouldn't use this.
readline
is an extremely handy library for accepting user input. Not only is it easy to use from the programmer's perspective, it provides the user with command line editing such as you would expect in e.g. the bash shell. In fact, bash uses readline
for user input, as do many other programs, so you may well already be familiar with it.
There is a lot of scope for customising readline
, but for basic usage all you need is something like:
#include <stdio.h> #include <readline/readline.h> char * foo() { return readline("Prompt: "); }
readline(3) allocates a fresh buffer for the string it reads in, and leaves it to you to deal with; don't forget to free() it when you're finished. Remember to link your executable to libreadline and libtermcap (with gcc
, this means using the linker switches -lreadline -ltermcap
), and make sure the readline
header files are in your include path. Most modern Linux distributions have readline
installed by default, though you may need to install the headers yourself (they are usually in a package called something like readline-devel
).
The only drawbacks to readline
are:
readline
installed. You can get around this by statically linking the library, but remember it is rather big (around 167 KB for version 4.3)getline
functiongetline
is a function added to the GNU version of libc to address the very problem this article discusses. For terminal input readline
is easier to use and arguably more portable, so if you are only accepting terminal input, use that instead.
Here's the kind of thing you'll need to do to use getline
:
#define _GNU_SOURCE #include <stdio.h> char * foo(FILE * f) { int n = 0, result; char * buf; result = getline(&buf, &n, f); if (result < 0) return NULL; return buf; }
getline
is an extension to the stdio
library, so again it is only available if you can rely on the presence of GNU libc. It's not a good idea to statically link it as that will mean your entire code is statically linked - usually you want to link standard libraries dynamically. Moreover, updates to libc won't affect a statically linked version.
Another disadvantage is that it trades simplicity for flexibility, so it is not quite as easy to use as (f)scanf
(above) or a custom function.
fgets
and realloc
Once all these options are exhausted (GNU libc or readline
is not reliably available, you're trying to read from a file rather than a terminal, ...), the only option left is to roll your own getline
function. Fortunately this is quite easy, as you can combine two completely standard library functions: fgets
and realloc
.
realloc
is like malloc
, except that it resizes blocks of memory instead of creating new ones. When you call it on a block, the contents of the block are preserved.
fgets
reads from a stream (i.e. a FILE *
) a whole line of text, unless it runs out of space or hits EOF first. So, it can only read in so much text before it has to give up. This means that, provided you do not tell it there is more space in the buffer than there really is, you will not get buffer overflows this way. You can get around this restriction by defining a function which reads from the stream repeatedly, getting more memory as needed, until finally the whole line has been read.
Here is one possible implementation:
#include <stdio.h> char * getline(FILE * f) { size_t size = 0; size_t len = 0; size_t last = 0; char * buf = NULL; do { size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */ buf = realloc(buf,size); /* realloc(NULL,n) is the same as malloc(n) */ /* Actually do the read. Note that fgets puts a terminal '\0' on the end of the string, so we make sure we overwrite this */ fgets(buf+last,size,f); len = strlen(buf); last = len - 1; } while (!feof(f) && buf[last]!='\\n'); return buf; }
This is not quite as efficient as it could be; optimisation is left as an exercise for the reader.