Piping, Forking, and Gathering execvp's Output

**Scriptonaut** · 11-19-2012

Hey guys,

I'm building a minishell in C(for unix). I'm doing command expansion, so lets say I have:

Code:

echo a $(echo b c)

it works fine, however if I then do this:

Code:

echo a $(echo b)

it prints:

Code:

a b c

I think I've figured it out, when I'm reading the output of echo(from my pipe), echo does not output any null terminator, so I go on for a bit until I run into one. This results in me picking up a bunch of garbage. Currently my read loop looks like this:

Code:

bytes_read = read(fd[0],buf,BUF_SIZE);
while(bytes_read > 0)
{
	/* HERES THE TROUBLE! */
	bytes_read = read(fd[0], buf, BUF_SIZE);
	if (bytes_read == -1) perror("read");
}
close(fd[0]);

So here it picks up a bit extra. I was thinking of simply turning the last '\n' into a 0, but I can't figure out how to start at the end of the string, and work my way back(if I did I probably wouldn't be having this problem), and if I start at the beginning and work forward, I may remove a newline that was meant to be there.

I was also thinking that I could write a null character right after echo executes, however, I don't know how I'd do this. When echo writes, it looks like this:

Code:

cpid = fork();
if (cpid < 0) {
	perror ("fork");
	free(args);
	return;
}
		
/* Check for who we are! */
if (cpid == 0) {
  /* We are the child! */
	
	/* Turning stdout into pipe output if needed */
	if((dup2(outFD, 1)) < 0)
	{
		perror("dup");
		return;
	}
			
	execvp(args[0], args);
	perror ("exec");
	exit (127);
	last_exit_status = 127;
}
		
dprintf(outFD, 0);
		
/* Have the parent wait for child to complete */
if(waitFlag)
{
	if (wait (&status) < 0) perror ("wait");
	last_exit_status = ((WEXITSTATUS(status) || !WIFEXITED(status)) ? 127 : 0);
}

I was thinking the dprintf(outFD, 0) line would work, but it doesn't seem to be the case. How do I mark the end of execvp's output?

Thanks for the help, if you guys want me to post my code I will.

**Salem** · 11-19-2012

All this talk of \n and \0 is way off being the right answer.
The low-level read() and write() functions just deal with a 'count' of the number of bytes processed. They pay no attention at all to the content of the stream.

Copying from one descriptor to another is simply

Code:

while ( (n=read(fd[0],buf,BUF_SIZE)) > 0 ) {
    write(fd[1],buf,n); // pedantically, check write returned n as well for success.
}
if ( n == 0 ) // fd was closed
if ( n < 0 ) // fd error'ed

**Nominal Animal** · 11-19-2012

Originally Posted by Scriptonaut

Code:

bytes_read = read(fd[0],buf,BUF_SIZE);
while(bytes_read > 0)
{
	/* HERES THE TROUBLE! */
	bytes_read = read(fd[0], buf, BUF_SIZE);
	if (bytes_read == -1) perror("read");
}
close(fd[0]);

First of all, you cannot assume a single read() call reads everything. It can return a short count (only some of the data available), or -1, even without errors. In particular, if a signal is delivered while a read() call blocks, the call will return -1 with errno==EINTR. And since you are using pipes and child processes, you should be handling signals anyway.

You'll know you have read all data when read() returns zero. (Or if it returns -1 with errno set to something other than EINTR, EWOULDBLOCK, or EAGAIN.)

Therefore, what you need is a loop that reads more into a (preferably dynamically allocated) buffer, until read() returns zero (indicating end of input) or returns -1 with errno set to indicate a real error (in which case you should probably abort).

Second, the data you read is not strings. It is just data. You cannot assume it is a string. You cannot use the string functions to manipulate it; you must treat it as a character array instead. (The difference is that a string is terminated, whereas a character array has a specific length.) For example, consider the case where you convert an image using the Netpbm tools and a pipe. If you assume it is a string, you'll break the data! Instead, you simply need to record the amount of data read.

For command substitution, you will convert the data read into one or more strings (tokens). I don't know what expansion rules you have chosen, but I think you want to split $(command...) at whitespace into separate tokens (ignoring leading and trailing whitespace, and treating consecutive whitespace as a single separator).

So, you read() all data into a buffer, then convert it into one or more strings. The conversion rules are simple: skip all NUL (\0) and whitespace characters ([URL=http://www.kernel.org/doc/man-pages/online/pages/man3/isspace.3.html]isspace()[/FONT]) in the buffer first. Copy the consecutive non-NUL, non-whitespace characters to your command line buffer. When you see a NUL or whitespace, skip them, and if followed by a non-NUL, non-whitespace character, start a new token. When you arrive at the end of the buffer -- remember, you must keep track of the index so you won't go past the number of chars you read() --, you have converted the output to your command line.

**Scriptonaut** · 11-19-2012

Originally Posted by Salem

All this talk of \n and \0 is way off being the right answer.
The low-level read() and write() functions just deal with a 'count' of the number of bytes processed. They pay no attention at all to the content of the stream.

Copying from one descriptor to another is simply

Code:

while ( (n=read(fd[0],buf,BUF_SIZE)) > 0 ) {
    write(fd[1],buf,n); // pedantically, check write returned n as well for success.
}
if ( n == 0 ) // fd was closed
if ( n < 0 ) // fd error'ed

I'm actually writing in the child process(I'm not even writing, I'm just dup2(outFD, 1), so that exec automatically writes for me). So here's my dilemma, how do I determine how many characters were written when I am reading in the parent? As far as I know, execvp doesn't return any info describing how many characters/bytes it wrote. I need to know this so I know when to stop copying from buf to my expanded string.

Here's another dilemma(that I didn't know I was having until I read your reply). I eventually fixed my program, but the way I did it was by removing the while loop, and simply changing it to:

Code:

bytes_read = read(fd[0],buf,BUF_SIZE);
if(bytes_read < 0) perror("read");
buf[bytes_read] = 0;
close(fd[0]);

The guys below says that this is wrong, because read won't necessarily read all the characters written to the pipe. So, how do I know when to stop reading? Will it attempt to read BUF_SIZE bytes? Does execvp insert some sort of terminating character? So if read takes a few times(meaning I would need to implement a loop), how could I get the cumulative number of bytes written? Would the following work?

Code:

total_ bytes_read = 0;
while((bytes_read = read(fd[0],buf,BUF_SIZE)))
{
	if(bytes_read < 0) perror("read");
	if(bytes_read > 0) total_bytes_read += bytes_read;
}
buf[total_bytes_read] = 0;
close(fd[0]);

Thanks so much for the help, this has been killing me.If you need me to post any additional info just please ask and I'd be more than happy.

**Scriptonaut** · 11-19-2012

Originally Posted by Nominal Animal

First of all, you cannot assume a single read() call reads everything. It can return a short count (only some of the data available), or -1, even without errors. In particular, if a signal is delivered while a read() call blocks, the call will return -1 with errno==EINTR. And since you are using pipes and child processes, you should be handling signals anyway.

You'll know you have read all data when read() returns zero. (Or if it returns -1 with errno set to something other than EINTR, EWOULDBLOCK, or EAGAIN.)

Alright, well the next part of this current assignment is to add signal handling, so I can start doing that now. I wasn't aware that read may not read it all at once, and I wasn't aware that it may return -1 without error. So basically what I need to do is the following?:

Code:

total_ bytes_read = 0;

Code:

while((bytes_read = read(fd[0],buf,BUF_SIZE)))
{
	if(bytes_read < 0)
	{
		if(errno == EWOULDBLOCK || errno == EAGAIN) continue;
		else if(errno == EINTR) // DEAL WITH SIGNAL HANDLING HERE
		else
		{
			perror("read");
			break; // Or should this be exit?
		}
	}
	else if(bytes_read > 0) total_bytes_read += bytes_read;
}
buf[total_bytes_read] = 0;
close(fd[0]);

Therefore, what you need is a loop that reads more into a (preferably dynamically allocated) buffer, until read() returns zero (indicating end of input) or returns -1 with errno set to indicate a real error (in which case you should probably abort).

Wait, dynamically allocated? I'm somewhat confused if it's being dynamically allocated, I'm not using malloc, but the function is being called recursively, so it may allocate any amount of memory. I have to be really careful to prevent stack overflows(according to my teacher), because he will be testing my script with command expansion up to 200,000 characters(with 256 byte buffers).

Second, the data you read is not strings. It is just data. You cannot assume it is a string. You cannot use the string functions to manipulate it; you must treat it as a character array instead. (The difference is that a string is terminated, whereas a character array has a specific length.) For example, consider the case where you convert an image using the Netpbm tools and a pipe. If you assume it is a string, you'll break the data! Instead, you simply need to record the amount of data read.

Alright, I'm starting to get this now. How exactly does read know when to stop reading? How does it know when execvp stopped outputting to my pipe?

For command substitution, you will convert the data read into one or more strings (tokens). I don't know what expansion rules you have chosen, but I think you want to split $(command...) at whitespace into separate tokens (ignoring leading and trailing whitespace, and treating consecutive whitespace as a single separator).

Ya, I am taking a single $(cmd ...) found in a string, and then expanding it. Since the entire cmd expansion system is recursive, it automatically separates it at whitespaces(and condenses whitespace to a single space)

So, you read() all data into a buffer, then convert it into one or more strings. The conversion rules are simple: skip all NUL (\0) and whitespace characters ([URL=http://www.kernel.org/doc/man-pages/online/pages/man3/isspace.3.html]isspace()[/FONT]) in the buffer first. Copy the consecutive non-NUL, non-whitespace characters to your command line buffer. When you see a NUL or whitespace, skip them, and if followed by a non-NUL, non-whitespace character, start a new token. When you arrive at the end of the buffer -- remember, you must keep track of the index so you won't go past the number of chars you read() --, you have converted the output to your command line.

I'm having trouble following this part. Read will automatically separate the buffer into tokens for me? How does it determine where to separate them? So if I come across two consecutive NUL or whitespace characters, or if I reach the end of the buffer length, I can stop copying? Currently I'm copying the from buf to my expanded string as follows:

Code:

/* COPY OUTPUT OF COMMAND TO EXPANDED LINE, UPDATE newl_ind */
i = 0;
while(buf[i] != 0)
{
	new[*newl_ind + i] = buf[i];
	i++;
}
*newl_ind = *newl_ind + i;
	
/* REMOVING EXTRA LINES */
i = 0;
while(new[i])
{
	if(new[i] == '\n') new[i] = ' ';
	i++;
}

This is obviously some crappy code, I don't even check to see if I am going beyond the bounds of the buffer. As you can see in my first code tag, I mark what I consider to be the end of the buffer with a NULL terminator. So this copy loop is completely based off of that.

Thanks for the help, you have really saved me several hours of confusion. Once I figure out these additional questions I'll be able to finish cmd expansion entirely. If you need any additional info just ask

**Nominal Animal** · 11-20-2012

Originally Posted by Scriptonaut

I wasn't aware that read may not read it all at once, and I wasn't aware that it may return -1 without error. So basically what I need to do is the following?

Here's the actual code I'd use if I was you.

It takes the file descriptor, a pointer to a buffer pointer, a pointer to the buffer size, a pointer to the size of data already in buffer, and the size to reserve at the end of the buffer, as parameters. It reads everything from the descriptor (until end of input) into the buffer, dynamically growing it if necessary.

You can reuse a previous buffer, allocate an initial one, or set the pointed-to values to NULL, 0, 0. Remember to free() the buffer after you no longer need it. See the example at end of this message.

The function ignores signal delivery interrupts, but will return an error if the descriptor is nonblocking and no data is immediately available. (File descriptors are blocking unless you specifically ask or set it nonblocking.)

The function interface is such that if you deem an error unimportant, you can simply call the function again, to read the rest of the input. You can even read data sequentially from multiple sources into one buffer. You can even pre-fill the buffer with your own data.

I would not normally show such complete code, but this is so often implemented only partially, or downright wrong, that I feel it should be shown completely.

Code:

#include <unistd.h>
#include <stdlib.h>
#include <errno.h>

/* If there are less than READ_MIN bytes available for
 * incoming data in the buffer, the buffer is reallocated. */
#define   READ_MIN   4096

/* When reallocating, the buffer is resized
 * for this amount of incoming data. */
#define   READ_MAX   131072

/* Read everything from a descriptor into a dynamically allocated buffer.
 * The function will return zero if success, errno otherwise.
 *   descriptor: File descriptor to read from.
 *   dataptr: Pointer to the buffer pointer.
 *   sizeptr: Pointer to the allocated size of the buffer.
 *   usedptr: Pointer to the number of chars in the buffer.
 *   reserve: Number of chars to reserve after the buffer.
 * The buffer must be either dynamically allocated,
 * or initialized to NULL,0,0.
*/
int read_all(int const descriptor,
             char  **const dataptr,
             size_t *const sizeptr,
             size_t *const usedptr,
             size_t  const reserve)
{
    if (descriptor != -1 && dataptr && sizeptr && usedptr) {
        char   *data = *dataptr;
        size_t  size = *sizeptr;
        size_t  used = *usedptr;
        ssize_t n;

        while (1) {

            /* Need to reallocate the buffer? */
            if (used + READ_MIN + reserve > size) {

                size = used + READ_MAX + reserve;
                data = realloc(data, size);
                if (!data)
                    return errno = ENOMEM;

                /* Update data and size for the caller. */
                *dataptr = data;
                *sizeptr = size;
            }

            /* Read more data. */
            do {
                n = read(descriptor, data + used, size - used - reserve);
            } while (n == (ssize_t)-1 && errno == EINTR);

            /* Error? If so, errno is already set. */
            if (n == (ssize_t)-1)
                return errno;

            /* Rare I/O error? */
            if (n < (ssize_t)-1)
                return errno = EIO;

            /* End of input? */
            if (n == (ssize_t)0)
                return errno = 0;

            /* We have n more chars read. */
            used += n;
            *usedptr = used;
        }
    } else {

        /* descriptor is -1, or one of the pointers NULL. */
        return errno = EINVAL;
    }
}

Here is an example main() to explore how the function works. I'll even throw in an example trim(), that converts the data into a string (adding a '\0' at the end), replacing all ASCII control characters and whitespace with a single space, and trimming out leading and trailing control characters and whitespace.

Code:

#include <stdio.h>
#include <string.h>
#include "read.h"

void trim(char *const data, size_t len)
{
    size_t  i = 0;
    size_t  o = 0;

    while (i < len)
        if (data[i] >= 0 && data[i] <= 32) {
            /* data[i] is an ASCII whitespace or control character. Skip. */
            while (i < len && data[i] >= 0 && data[i] <= 32)
                i++;

            /* Add separator, but only between tokens. */
            if (i < len && o > 0)
                data[o++] = ' ';
        } else
            data[o++] = data[i++];

    /* Note: o may be len, so this may be data[len] = '\0'. */
    data[o] = '\0';
}


int main(void)
{
    char   *data = NULL;
    size_t  size = 0;
    size_t  len  = 0;

    if (read_all(STDIN_FILENO, &data, &size, &len, 1)) {
        fprintf(stderr, "Error reading from standard input: %s.\n", strerror(errno));
        fflush(stderr);
        /* Do not abort, though. */
    }

    if (len > 0) {
        /* Trim whitespace from the input data, and append '\0'.
         * The final '\0' is why we reserved 1 char in read_all(). */
        trim(data, len);
    }

    if (len > 0) {
        printf("Read %lu chars of input, into a %lu char buffer.\n",
               (unsigned long)len, (unsigned long)size);
        printf("Trimmed down to '%s'.\n", data);
    } else
        printf("No input.\n");

    free(data);
    data = NULL;
    size = 0;
    len  = 0;

    return 0;
}

Questions?

Originally Posted by Scriptonaut

I'm not using malloc

Then you're doing it wrong.

I hope you get your program working. I can imagine the contortions you have to do to make it work with just local variables (on stack)..

Originally Posted by Scriptonaut

Alright, I'm starting to get this now. How exactly does read know when to stop reading?

Whenever read() returns zero, it means there is no more data to read. Either you are at the end of the file, or the other end of the pipe or socket closed the connection.

(On the command line, pressing Ctrl+D causes that to happen; it just does not close the connection. It only tells the process reading the input you do not intend to provide any more input.)

Originally Posted by Scriptonaut

Ya, I am taking a single $(cmd ...) found in a string, and then expanding it. Since the entire cmd expansion system is recursive, it automatically separates it at whitespaces(and condenses whitespace to a single space)

In that case, you could use the data read by read_all() , remove any embedded '\0', then append an end-of-string '\0' to the data -- just like my trim() function does in the example. That converts the data into a string you can supply it to your command expansion system.

Originally Posted by Scriptonaut

I'm having trouble following this part. Read will automatically separate the buffer into tokens for me?

No, I meant you will. Read does not do anything to the data.

**Andreea** · 11-20-2012

The low-level read(), write() system calls just read/write bytes. The profile of this functions (see man 2 read) requires only a valid memory address (void *) to start writing/reading from it.

If you want to separate into tokens a char * you just read using read() you have to make first a copy of your char * and use a tokenizer as strtok() on the copy.

Thread: Piping, Forking, and Gathering execvp's Output

Thread Tools

Search Thread

Display

Piping, Forking, and Gathering execvp's Output

Similar Threads

Piping OUTPUT from 1 script, and using it as INPUT for a different script

Piping in text output

piping output input

Gathering DLL Infromation

Piping output under .NET

Tags for this Thread