Strace, clone and execve for childs and subshells

Hello experts,

I am simply analyzing the clone() and execve() cycles for series of commands using strace.

strace is run from different terminal using PID of shell to see complete trace.

In the following combination :

bash -c "cat file1; cat file2"

it is observed that :

clone() called for bash -c
execve () called for bash -c , and this calls clone() again for cat file1 with execve(),
no clone() is called for cat file2 , and is executed with same execve() of the bash -c .

Can somebody please explain where, in which space second command is executing , because no child is created, and there exists no sub shell, (and why separate child is not created)

# strace -f -p 299 > stlog 2>&1
^C

# cat stlog | grep -n clone
18:clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 1805 attached
285:[pid  1805] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 1806 attached

# cat stlog | grep -n execve
90:[pid  1805] execve("/usr/bin/bash", ["bash", "-c", "cat filer1; cat filer2"], 0x5fa26e095730 /* 26 vars */) = 0
313:[pid  1806] execve("/usr/bin/cat", ["cat", "filer1"], 0x6476b8eb6220 /* 26 vars */) = 0
443:[pid  1805] execve("/usr/bin/cat", ["cat", "filer2"], 0x6476b8eb6890 /* 26 vars */) = 0

Maybe bash optimizes the last command: instead of

clone
  execve
exit

it does

execve

So in what space it is executing ..

With so many learned people here, I am not getting any replies.

I am not getting reason to this, either my questions are so stupid and useless,
or I may have made an offense I am unaware of, because of which people may be avoiding my questions

I can't believe that they don't have that much knowledge to reply .

Some of us visit every day, but not all day.

It seems perfectly reasonable that a subshell (which is creating separate processes for a sequence of commands) should sacrifice its own process to exec the last command over itself. There is nothing else it needs to do after that point anyway: the bash -cprocess inherits all the child processes regardless.

The source of bash is available online, but may be rather obscure. The conditions under which this optimisation may be disallowed may not be obvious: for example, if that last command includes redirections, or local environment changes.

1 Like

Thanks for your response

so do you mean to say, last command is replacing the space of bash -c.

If so, then what if I want my command(s) to exit out in bash -c shell, should I use 2 bash -c processes knowing that one will be replaced by the last command.

And if you mean that last command is executing by sharing the space of bash -c , then the variable defined in bash -c shall be available to last command.

"Space" is too vague here. If you had a large shell that ran a tiny final command, letting it have the same memory would be wasteful. All it does is to use the same process id, and therefore Kernel has access to the existing resources.

How that works is that the bash -c sets up an execve() call to the Kernel. The arguments passed to that call are the pathname of the new code to be loaded, an array of arguments, and an array of environment values. The Kernel holds those temporarily, and then blitzes (almost) all the resources owned by the old process.

All that works exactly the same whether the old PID came from a recent clone() or the original bash -c: either way, that PID inherits a duplicate of its parent's resources.

Kernel then loads the new executable program using the old PID, making sure it sets up the stack for the new main() with the stuff passed via execve().

You probably need to read man execve. The key part is:

execve() executes the program referred to by pathname. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments.

Nothing to worry about there: every process ever launched (except PID 1 init itself) came into being the same way. Your bash -c put the args in, and your grep -n execve shows they are passed into execve().

The bash -c does not clone() a new process because it does not need to. If you added another command to its call, it would clone() both cats because they are no longer the last command. Maybe echo Status ${?} would be a good last command, although that would be a built-in, so neither clone() nor execve() would be needed. Maybe test with sleep 5.

Two bash -c calls would result in neither of them needing to be cloned, because they would each consist only of a final command.

If your command in bash -c contains a pipeline, then there are more complications. Shell has to create the processes in the pipeline in reverse order, to ensure each pipe has a reader before it gets a writer (thereby avoiding spurious SIGPIPE issues).

Final note: If the pathname leads to an interpreted language (i.e. it names a shell, awk, python ... script), then kernel uses its shebang to find the interpreter binary itself, and runs execve() on that binary instead, passing the script name as an arg to the binary.

1 Like

Thanks Paul , for explaining this in so much detail to me.