C – Figure out what startpar.c (sysvinit) is doing

Figure out what startpar.c (sysvinit) is doing… here is a solution to the problem.

Figure out what startpar.c (sysvinit) is doing

Well, it’s a long one, cheer up! 🙂

Recently I tried to launch a watchdog script written in bash during startup. So I added a line to rc.local with the following:

su someuser -c "/home/someuser/watchdog.sh &"

watchdog.sh looks like this:

#!/bin/bash
until /home/someuser/eventMonitoring.py
do
    sleep 1
done

Everything is fine,

everything is fine, the script begins. However, a new process appears in the list of processes and stays there forever:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root      3048     1  0  1024   620   1 20:04 ?        00:00:00 startpar -f -- rc.local

Now, my script (watchdog.sh) starts and detaches successfully because its PPID is also 1. My task at the time was to find out what that process was. Startpar is the sysvinit boot system ( http://savannah.nongnu.org/projects/sysvinit). I am currently using Debian Wheezy 7.4.0 for this system. Now man startpar says

startpar is used to run multiple run-level scripts in parallel.

Through trial and error, I basically figured out how to properly launch my script during startup instead of letting startpar hang. All file descriptors of the process need to be redirected to a file or /dev/null or closed together. It’s a rational thing to do when you think about it. I finally did it :

su someuser -c "some_script.sh >/dev/null 2>&1 &"

This solves the problem. But still makes me wonder why this is so. Why startpar behaves the way it is. Is this a bug or a feature.

So I delved into the code ( http://svn.savannah.nongnu.org/viewvc/startpar/trunk/startpar.c?root=sysvinit&view=markup) and start from start to finish :

First, I found where startpar -f — rc.local was called:
Line 741:

execlp(myname, myname, "-f", "--", p->name, NULL);

Ok, this actually starts a new startpar process, which will replace the currently running instance. It is basically a recursive call on itself. Let’s see what the -f parameter does:

Line 866:

case 'f':
      forw = 1;
      break;

Ok, let’s see what happens when we set the forw variable to 1….
Line 900:

if (forw)
    do_forward();

Finally let’s see what this function is all about:

Line 615:

void do_forward(void)
{
  char buf[4096], *b;
  ssize_t r, rr;
  setsid();
  while ((r = read(0, buf, sizeof(buf))))
    {
      if (r < 0)
    {
      if (errno == EINTR)
        continue;
#if defined(DEBUG) && (DEBUG > 0)
      perror("\n\rstartpar: forward read");
#endif
      break;
    }
      b = buf;
      while (r > 0)
    {
      rr = write(1, b, r);
      if (rr < 0)
        {
          if (errno == EINTR)
        continue;
          perror("\n\rstartpar: forward write");
          rr = r;
        }
      r -= rr;
      b += rr;
    }
    }
  _exit(0);
}

As I understand it. This redirects all content from file descriptor 0 to file descriptor 1. Now let’s see what is actually linked to these file descriptors:

root@server:~# ls -al /proc/3048/fd
total 0
dr-x------ 2 root root  0 Apr  2 21:13 .
dr-xr-xr-x 8 root root  0 Apr  2 21:13 ..
lrwx------ 1 root root 64 Apr  2 21:13 0 -> /dev/ptmx
lrwx------ 1 root root 64 Apr  2 21:13 1 -> /dev/console
lrwx------ 1 root root 64 Apr  2 21:13 2 -> /dev/console

Well, it’s interesting… So ptmx is based on human parlance:

The file /dev/ptmx is a character file with major number 5 
and minor number 2, usually of mode 0666 and owner.group of root.root. 
It is used to create a pseudoterminal master and slave pair.

and console:

The current console is also addressed by
/dev/console or /dev/tty0, the character device with major number 4
and minor number 0.

That’s when I came to StackOverflow. Now, can someone tell me what’s going on here? Am I doing it right, startpar is in the phase of constantly redirecting anything that arrives at ptmx to the console? Why do you want to do this? Why PTMX? Is this a bug?

Solution

Long story short

This is definitely not the fault of startpar, it is doing exactly that promises to in the first place .

The output of each script is buffered and written when the script exits, so output lines of different scripts won’t mix. You can modify this behaviour by setting a timeout.


Code details

In The run() function in startpar.c

  1. Line 422: Get the handle to the primary pseudoterminal (/dev/ptmx in this example).

    p->fd = getpt();

  2. Line 429: Get the path to the corresponding slave pseudoterminal

    else if ((m = ptsname(p->fd)) == 0 || grantpt(p->fd)

    || unlockpt(p->fd)).

  3. Line 438: Fork a child process

    if ((p->pid = fork()) == (pid_t)-1)

  4. Line 475: Invalidates the default stdout

    TEMP_FAILURE_RETRY (closed (1));

  5. Line 476: Get a handle to the slave pseudo-terminal. Now, this is 1, i.e. the child’s stdout is now redirected to the slave pseudoterminal (and received by the primary pseudo-endpoint).

    if (open(m, O_RDWR) != 1)

  6. Line 481: Stderr is also captured by copying it using the salve pseudoterminal fd.

    TEMP_FAILURE_RETRY(dup2(1, 2));

  7. Line 561: After some bookkeeping, launch the executable of interest (as a child process).

    execlp(p->name, p->arg0, (char *)0);

  8. The parent process can later capture all output/error logs of this newly started process by reading the buffered primary pseudo-terminal and log them to the actual standard output (i.e. /dev/console in this case).


How do I prevent the dangling startpar -f... process on my system?

Method one: Define the executable to be launched as interactive.

Explicitly marking executable interactions tells startpar to skip pseudo-terminal master/slave spoofing to buffer terminal I/O, because any output from the launched interactive executable needs to be displayed immediately on the screen instead of buffered

This modifies the execution process in several places. Mostly on line 1171, where startpar does not call the run() function for the interactive executable.

This has been tested and described< a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=537182#20" rel="noreferrer noopener nofollow">here

Method 2: Discard the stdout and stderr of the executable file you want to launch.

Use the construct “>/

dev/null 2>&1&" to drop stdout/stderr for the executable to be launched. If they are both explicitly set to NULL, that is, startpar does not buffer them indefinitely as it normally does.

Method 3: Set an explicit timeout for startpar

Either configure timo

in startpar.c

The timeout set with the -t option is used as buffer timeout. If the output buffer of a script is not empty and the last output was timeout seconds ago, startpar will flush the buffer.

or gtimo in

startpar.c

The -T option timeout works more globally. If no output is printed for more than global_timeout seconds, startpar will flush the buffer of the script with the oldest output. Afterwards it will only print output of this script until it is finished.

Related Problems and Solutions