PR server/16255: gdbserver cannot attach to a second inferior that is multi-threaded.
On Linux, we need to explicitly ptrace attach to all lwps of a
process. Because GDB might not be connected yet when an attach is
requested, and thus it may not be possible to activate thread_db, as
that requires access to symbols (IOW, gdbserver --attach), a while ago
we make linux_attach loop over the lwps as listed by /proc/PID/task to
find the lwps to attach to.
linux_attach_lwp_1 has:
...
if (initial)
/* If lwp is the tgid, we handle adding existing threads later.
Otherwise we just add lwp without bothering about any other
threads. */
ptid = ptid_build (lwpid, lwpid, 0);
else
{
/* Note that extracting the pid from the current inferior is
safe, since we're always called in the context of the same
process as this new thread. */
int pid = pid_of (current_inferior);
ptid = ptid_build (pid, lwpid, 0);
}
That "safe" comment referred to linux_attach_lwp being called by
thread-db.c. But this was clearly missed when a new call to
linux_attach_lwp_1 was added to linux_attach. As a result,
current_inferior will be set to some random process, and non-initial
lwps of the second inferior get assigned the pid of the wrong
inferior. E.g., in the case of attaching to two inferiors, for the
second inferior (and so on), non-initial lwps of the second inferior
get assigned the pid of the first inferior. This doesn't trigger on
the first inferior, when current_inferior is NULL, add_thread switches
the current inferior to the newly added thread.
Rather than making linux_attach switch current_inferior temporarily
(thus avoiding further reliance on global state), or making
linux_attach_lwp_1 get the tgid from /proc, which add extra syscalls,
and will be wrong in case of the user having originally attached
directly to a non-tgid lwp, and then that lwp spawning new clones (the
ptid.pid field of further new clones should be the same as the
original lwp's pid, which is not the tgid), we note that callers of
linux_attach_lwp/linux_attach_lwp_1 always have the right pid handy
already, so they can pass it down along with the lwpid.
The only other reason for the "initial" parameter is to error out
instead of warn in case of attach failure, when we're first attaching
to a process. There are only three callers of
linux_attach_lwp/linux_attach_lwp_1, and each wants to print a
different warn/error string, so we can just move the error/warn out of
linux_attach_lwp_1 to the callers, thus getting rid of the "initial"
parameter.
There really nothing gdbserver-specific about attaching to two
threaded processes, so this adds a new test under gdb.multi/. The
test passes cleanly against the native GNU/Linux target, but
fails/triggers the bug against GDBserver (before the patch), with the
native-extended-remote board (as plain remote doesn't support
multi-process).
Tested on x86_64 Fedora 17, with the native-extended-gdbserver board.
gdb/gdbserver/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* linux-low.c (linux_attach_fail_reason_string): New function.
(linux_attach_lwp): Delete.
(linux_attach_lwp_1): Rename to ...
(linux_attach_lwp): ... this. Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach): Adjust to call linux_attach_lwp. Call error on
failure to attach to the tgid. Call warning when failing to
attach to an lwp.
* linux-low.h (linux_attach_lwp): Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach_fail_reason_string): New declaration.
* thread-db.c (attach_thread): Adjust to linux_attach_lwp's
interface change. Use linux_attach_fail_reason_string.
gdb/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* common/linux-ptrace.c (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this. Remove "warning: "
and newline from built string.
* common/linux-ptrace.h (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this.
* linux-nat.c (linux_nat_attach): Adjust to use
linux_ptrace_attach_fail_reason.
gdb/testsuite/
2014-04-25 Simon Marchi <simon.marchi@ericsson.com>
Pedro Alves <palves@redhat.com>
PR server/16255
* gdb.multi/multi-attach.c: New file.
* gdb.multi/multi-attach.exp: New file.
2014-04-25 18:07:33 +00:00
|
|
|
# This testcase is part of GDB, the GNU debugger.
|
|
|
|
|
2015-01-01 09:32:14 +00:00
|
|
|
# Copyright 2007-2015 Free Software Foundation, Inc.
|
PR server/16255: gdbserver cannot attach to a second inferior that is multi-threaded.
On Linux, we need to explicitly ptrace attach to all lwps of a
process. Because GDB might not be connected yet when an attach is
requested, and thus it may not be possible to activate thread_db, as
that requires access to symbols (IOW, gdbserver --attach), a while ago
we make linux_attach loop over the lwps as listed by /proc/PID/task to
find the lwps to attach to.
linux_attach_lwp_1 has:
...
if (initial)
/* If lwp is the tgid, we handle adding existing threads later.
Otherwise we just add lwp without bothering about any other
threads. */
ptid = ptid_build (lwpid, lwpid, 0);
else
{
/* Note that extracting the pid from the current inferior is
safe, since we're always called in the context of the same
process as this new thread. */
int pid = pid_of (current_inferior);
ptid = ptid_build (pid, lwpid, 0);
}
That "safe" comment referred to linux_attach_lwp being called by
thread-db.c. But this was clearly missed when a new call to
linux_attach_lwp_1 was added to linux_attach. As a result,
current_inferior will be set to some random process, and non-initial
lwps of the second inferior get assigned the pid of the wrong
inferior. E.g., in the case of attaching to two inferiors, for the
second inferior (and so on), non-initial lwps of the second inferior
get assigned the pid of the first inferior. This doesn't trigger on
the first inferior, when current_inferior is NULL, add_thread switches
the current inferior to the newly added thread.
Rather than making linux_attach switch current_inferior temporarily
(thus avoiding further reliance on global state), or making
linux_attach_lwp_1 get the tgid from /proc, which add extra syscalls,
and will be wrong in case of the user having originally attached
directly to a non-tgid lwp, and then that lwp spawning new clones (the
ptid.pid field of further new clones should be the same as the
original lwp's pid, which is not the tgid), we note that callers of
linux_attach_lwp/linux_attach_lwp_1 always have the right pid handy
already, so they can pass it down along with the lwpid.
The only other reason for the "initial" parameter is to error out
instead of warn in case of attach failure, when we're first attaching
to a process. There are only three callers of
linux_attach_lwp/linux_attach_lwp_1, and each wants to print a
different warn/error string, so we can just move the error/warn out of
linux_attach_lwp_1 to the callers, thus getting rid of the "initial"
parameter.
There really nothing gdbserver-specific about attaching to two
threaded processes, so this adds a new test under gdb.multi/. The
test passes cleanly against the native GNU/Linux target, but
fails/triggers the bug against GDBserver (before the patch), with the
native-extended-remote board (as plain remote doesn't support
multi-process).
Tested on x86_64 Fedora 17, with the native-extended-gdbserver board.
gdb/gdbserver/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* linux-low.c (linux_attach_fail_reason_string): New function.
(linux_attach_lwp): Delete.
(linux_attach_lwp_1): Rename to ...
(linux_attach_lwp): ... this. Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach): Adjust to call linux_attach_lwp. Call error on
failure to attach to the tgid. Call warning when failing to
attach to an lwp.
* linux-low.h (linux_attach_lwp): Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach_fail_reason_string): New declaration.
* thread-db.c (attach_thread): Adjust to linux_attach_lwp's
interface change. Use linux_attach_fail_reason_string.
gdb/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* common/linux-ptrace.c (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this. Remove "warning: "
and newline from built string.
* common/linux-ptrace.h (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this.
* linux-nat.c (linux_nat_attach): Adjust to use
linux_ptrace_attach_fail_reason.
gdb/testsuite/
2014-04-25 Simon Marchi <simon.marchi@ericsson.com>
Pedro Alves <palves@redhat.com>
PR server/16255
* gdb.multi/multi-attach.c: New file.
* gdb.multi/multi-attach.exp: New file.
2014-04-25 18:07:33 +00:00
|
|
|
|
|
|
|
# This program is free software; you can redistribute it and/or modify
|
|
|
|
# it under the terms of the GNU General Public License as published by
|
|
|
|
# the Free Software Foundation; either version 3 of the License, or
|
|
|
|
# (at your option) any later version.
|
|
|
|
#
|
|
|
|
# This program is distributed in the hope that it will be useful,
|
|
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
# GNU General Public License for more details.
|
|
|
|
#
|
|
|
|
# You should have received a copy of the GNU General Public License
|
|
|
|
# along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
# Test attaching to multiple threaded programs.
|
|
|
|
|
|
|
|
standard_testfile
|
|
|
|
|
2015-01-09 11:04:19 +00:00
|
|
|
if {![can_spawn_for_attach]} {
|
PR server/16255: gdbserver cannot attach to a second inferior that is multi-threaded.
On Linux, we need to explicitly ptrace attach to all lwps of a
process. Because GDB might not be connected yet when an attach is
requested, and thus it may not be possible to activate thread_db, as
that requires access to symbols (IOW, gdbserver --attach), a while ago
we make linux_attach loop over the lwps as listed by /proc/PID/task to
find the lwps to attach to.
linux_attach_lwp_1 has:
...
if (initial)
/* If lwp is the tgid, we handle adding existing threads later.
Otherwise we just add lwp without bothering about any other
threads. */
ptid = ptid_build (lwpid, lwpid, 0);
else
{
/* Note that extracting the pid from the current inferior is
safe, since we're always called in the context of the same
process as this new thread. */
int pid = pid_of (current_inferior);
ptid = ptid_build (pid, lwpid, 0);
}
That "safe" comment referred to linux_attach_lwp being called by
thread-db.c. But this was clearly missed when a new call to
linux_attach_lwp_1 was added to linux_attach. As a result,
current_inferior will be set to some random process, and non-initial
lwps of the second inferior get assigned the pid of the wrong
inferior. E.g., in the case of attaching to two inferiors, for the
second inferior (and so on), non-initial lwps of the second inferior
get assigned the pid of the first inferior. This doesn't trigger on
the first inferior, when current_inferior is NULL, add_thread switches
the current inferior to the newly added thread.
Rather than making linux_attach switch current_inferior temporarily
(thus avoiding further reliance on global state), or making
linux_attach_lwp_1 get the tgid from /proc, which add extra syscalls,
and will be wrong in case of the user having originally attached
directly to a non-tgid lwp, and then that lwp spawning new clones (the
ptid.pid field of further new clones should be the same as the
original lwp's pid, which is not the tgid), we note that callers of
linux_attach_lwp/linux_attach_lwp_1 always have the right pid handy
already, so they can pass it down along with the lwpid.
The only other reason for the "initial" parameter is to error out
instead of warn in case of attach failure, when we're first attaching
to a process. There are only three callers of
linux_attach_lwp/linux_attach_lwp_1, and each wants to print a
different warn/error string, so we can just move the error/warn out of
linux_attach_lwp_1 to the callers, thus getting rid of the "initial"
parameter.
There really nothing gdbserver-specific about attaching to two
threaded processes, so this adds a new test under gdb.multi/. The
test passes cleanly against the native GNU/Linux target, but
fails/triggers the bug against GDBserver (before the patch), with the
native-extended-remote board (as plain remote doesn't support
multi-process).
Tested on x86_64 Fedora 17, with the native-extended-gdbserver board.
gdb/gdbserver/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* linux-low.c (linux_attach_fail_reason_string): New function.
(linux_attach_lwp): Delete.
(linux_attach_lwp_1): Rename to ...
(linux_attach_lwp): ... this. Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach): Adjust to call linux_attach_lwp. Call error on
failure to attach to the tgid. Call warning when failing to
attach to an lwp.
* linux-low.h (linux_attach_lwp): Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach_fail_reason_string): New declaration.
* thread-db.c (attach_thread): Adjust to linux_attach_lwp's
interface change. Use linux_attach_fail_reason_string.
gdb/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* common/linux-ptrace.c (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this. Remove "warning: "
and newline from built string.
* common/linux-ptrace.h (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this.
* linux-nat.c (linux_nat_attach): Adjust to use
linux_ptrace_attach_fail_reason.
gdb/testsuite/
2014-04-25 Simon Marchi <simon.marchi@ericsson.com>
Pedro Alves <palves@redhat.com>
PR server/16255
* gdb.multi/multi-attach.c: New file.
* gdb.multi/multi-attach.exp: New file.
2014-04-25 18:07:33 +00:00
|
|
|
return 0
|
|
|
|
}
|
|
|
|
|
|
|
|
if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug additional_flags=-lpthread}]} {
|
|
|
|
return -1
|
|
|
|
}
|
|
|
|
|
|
|
|
# Start the programs running and then wait for a bit, to be sure that
|
|
|
|
# they can be attached to.
|
2014-09-11 12:04:14 +00:00
|
|
|
|
testsuite: tcl exec& -> 'kill -9 $pid' is racy (attach-many-short-lived-thread.exp races and others)
The buildbots show that attach-many-short-lived-thread.exp is racy.
But after staring at debug logs and playing with SystemTap scripts for
a (long) while, I figured out that neither GDB, nor the kernel nor the
test's program itself are at fault.
The problem is simply that the testsuite machinery is currently
subject to PID-reuse races. The attach-many-short-lived-threads.c
test program just happens to be much more susceptible to trigger this
race because threads and processes share the same number space on
Linux, and the test spawns many many short lived threads in
succession, thus enlarging the race window a lot.
Part of the problem is that several tests spawn processes with "exec&"
(in order to test the "attach" command) , and then at the end of the
test, to make sure things are cleaned up, issue a 'remote_spawn "kill
-p $testpid"'. Since with tcl's "exec&", tcl itself is responsible
for reaping the process's exit status, when we go kill the process,
testpid may have already exited _and_ its status may have (and often
has) been reaped already. Thus it can happen that another process
meanwhile reuses $testpid, and that "kill" command kills the wrong
process... Frequently, that happens to be
attach-many-short-lived-thread, but this explains other test's races
as well.
In the attach-many-short-lived-threads test, it sometimes manifests
like this:
(gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads
Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done.
(gdb) Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb
attach 5940
Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940
warning: process 5940 is a zombie - the process has already terminated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ptrace: Operation not permitted.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
info threads
No threads.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
set breakpoint always-inserted on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on
Other times the process dies while the test is ongoing (the process is
ptrace-stopped):
(gdb) print again = 1
Cannot access memory at address 0x6020cc
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior
(Recall that on Linux, SIGKILL is not interceptable)
And other times it dies just while we're detaching:
$4 = 319
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left
detach
Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach
GDB mishandles the latter (it should ignore ESRCH while detaching just
like when continuing), but that's another story.
The fix here is to change spawn_wait_for_attach to use Expect's
'spawn' command instead of Tcl's 'exec&' to spawn programs, because
with spawn we control when to wait for/reap the process. That allows
killing the process by PID without being subject to pid-reuse races,
because even if the process is already dead, the kernel won't reuse
the process's PID until the zombie is reaped.
The other part of the problem lies in DejaGnu itself, unfortunately.
I have occasionally seen tests (attach-many-short-lived-threads
included, but not only that one) die with a random inexplicable
SIGTERM too, and that too is caused by the same reason, except that in
that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp:
exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &"
...
catch "wait -i $shell_id"
Even if the program exits promptly, that whole cascade of kills
carries on in the background, thus potentially killing the poor
process that manages to reuse $pid...
I sent a fix for that to the DejaGnu list:
http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html
With both patches in place, I haven't seen
attach-many-short-lived-threads.exp fail again.
Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver.
gdb/testsuite/ChangeLog:
2015-07-31 Pedro Alves <palves@redhat.com>
* gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id.
Use spawn_id_get_pid. Wait for spawn id after eof. Use
kill_wait_spawned_process instead of explicit "kill -9".
* gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach
returning a spawn id instead of a pid. Use spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/attach-twice.exp: Likewise.
* gdb.base/attach.exp: Likewise.
(do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and
gdb_test_multiple.
* gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach
returning a spawn id instead of a pid. Use spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/valgrind-infcall.exp: Likewise.
* gdb.multi/multi-attach.exp: Likewise.
* gdb.python/py-prompt.exp: Likewise.
* gdb.python/py-sync-interp.exp: Likewise.
* gdb.server/ext-attach.exp: Likewise.
* gdb.threads/attach-into-signal.exp (corefunc): Use
spawn_wait_for_attach, spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.threads/attach-many-short-lived-threads.exp: Adjust to
spawn_wait_for_attach returning a spawn id instead of a pid. Use
spawn_id_get_pid and kill_wait_spawned_process.
* gdb.threads/attach-stopped.exp (corefunc): Use
spawn_wait_for_attach, spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/break-interp.exp: Rename $res to $test_spawn_id.
Use spawn_id_get_pid. Wait for spawn id after eof. Use
kill_wait_spawned_process instead of explicit "kill -9".
* lib/gdb.exp (can_spawn_for_attach): Adjust comment.
(kill_wait_spawned_process, spawn_id_get_pid): New procedures.
(spawn_wait_for_attach): Use spawn instead of exec to spawn
processes. Don't map cygwin/windows pids here. Now returns a
spawn id list.
2015-07-31 19:06:24 +00:00
|
|
|
set spawn_id_list [spawn_wait_for_attach [list $binfile $binfile]]
|
|
|
|
set test_spawn_id1 [lindex $spawn_id_list 0]
|
|
|
|
set test_spawn_id2 [lindex $spawn_id_list 1]
|
|
|
|
set testpid1 [spawn_id_get_pid $test_spawn_id1]
|
|
|
|
set testpid2 [spawn_id_get_pid $test_spawn_id2]
|
PR server/16255: gdbserver cannot attach to a second inferior that is multi-threaded.
On Linux, we need to explicitly ptrace attach to all lwps of a
process. Because GDB might not be connected yet when an attach is
requested, and thus it may not be possible to activate thread_db, as
that requires access to symbols (IOW, gdbserver --attach), a while ago
we make linux_attach loop over the lwps as listed by /proc/PID/task to
find the lwps to attach to.
linux_attach_lwp_1 has:
...
if (initial)
/* If lwp is the tgid, we handle adding existing threads later.
Otherwise we just add lwp without bothering about any other
threads. */
ptid = ptid_build (lwpid, lwpid, 0);
else
{
/* Note that extracting the pid from the current inferior is
safe, since we're always called in the context of the same
process as this new thread. */
int pid = pid_of (current_inferior);
ptid = ptid_build (pid, lwpid, 0);
}
That "safe" comment referred to linux_attach_lwp being called by
thread-db.c. But this was clearly missed when a new call to
linux_attach_lwp_1 was added to linux_attach. As a result,
current_inferior will be set to some random process, and non-initial
lwps of the second inferior get assigned the pid of the wrong
inferior. E.g., in the case of attaching to two inferiors, for the
second inferior (and so on), non-initial lwps of the second inferior
get assigned the pid of the first inferior. This doesn't trigger on
the first inferior, when current_inferior is NULL, add_thread switches
the current inferior to the newly added thread.
Rather than making linux_attach switch current_inferior temporarily
(thus avoiding further reliance on global state), or making
linux_attach_lwp_1 get the tgid from /proc, which add extra syscalls,
and will be wrong in case of the user having originally attached
directly to a non-tgid lwp, and then that lwp spawning new clones (the
ptid.pid field of further new clones should be the same as the
original lwp's pid, which is not the tgid), we note that callers of
linux_attach_lwp/linux_attach_lwp_1 always have the right pid handy
already, so they can pass it down along with the lwpid.
The only other reason for the "initial" parameter is to error out
instead of warn in case of attach failure, when we're first attaching
to a process. There are only three callers of
linux_attach_lwp/linux_attach_lwp_1, and each wants to print a
different warn/error string, so we can just move the error/warn out of
linux_attach_lwp_1 to the callers, thus getting rid of the "initial"
parameter.
There really nothing gdbserver-specific about attaching to two
threaded processes, so this adds a new test under gdb.multi/. The
test passes cleanly against the native GNU/Linux target, but
fails/triggers the bug against GDBserver (before the patch), with the
native-extended-remote board (as plain remote doesn't support
multi-process).
Tested on x86_64 Fedora 17, with the native-extended-gdbserver board.
gdb/gdbserver/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* linux-low.c (linux_attach_fail_reason_string): New function.
(linux_attach_lwp): Delete.
(linux_attach_lwp_1): Rename to ...
(linux_attach_lwp): ... this. Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach): Adjust to call linux_attach_lwp. Call error on
failure to attach to the tgid. Call warning when failing to
attach to an lwp.
* linux-low.h (linux_attach_lwp): Take a ptid instead of a pid as
argument. Remove "initial" parameter. Return int instead of
void. Don't error or warn here.
(linux_attach_fail_reason_string): New declaration.
* thread-db.c (attach_thread): Adjust to linux_attach_lwp's
interface change. Use linux_attach_fail_reason_string.
gdb/
2014-04-25 Pedro Alves <palves@redhat.com>
PR server/16255
* common/linux-ptrace.c (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this. Remove "warning: "
and newline from built string.
* common/linux-ptrace.h (linux_ptrace_attach_warnings): Rename to ...
(linux_ptrace_attach_fail_reason): ... this.
* linux-nat.c (linux_nat_attach): Adjust to use
linux_ptrace_attach_fail_reason.
gdb/testsuite/
2014-04-25 Simon Marchi <simon.marchi@ericsson.com>
Pedro Alves <palves@redhat.com>
PR server/16255
* gdb.multi/multi-attach.c: New file.
* gdb.multi/multi-attach.exp: New file.
2014-04-25 18:07:33 +00:00
|
|
|
|
|
|
|
gdb_test "attach $testpid1" \
|
|
|
|
"Attaching to program: .*, process $testpid1.*(in|at).*" \
|
|
|
|
"attach to program 1"
|
|
|
|
gdb_test "backtrace" ".*main.*" "backtrace 1"
|
|
|
|
|
|
|
|
gdb_test "add-inferior -exec $binfile" \
|
|
|
|
"Added inferior 2.*" \
|
|
|
|
"add second inferior"
|
|
|
|
gdb_test "inferior 2" ".*Switching to inferior 2.*" "switch to second inferior"
|
|
|
|
|
|
|
|
gdb_test "attach $testpid2" \
|
|
|
|
"Attaching to program: .*, process $testpid2.*(in|at).*" \
|
|
|
|
"attach to program 2"
|
|
|
|
gdb_test "backtrace" ".*main.*" "backtrace 2"
|
|
|
|
|
|
|
|
gdb_test "kill" "" "kill inferior 2" "Kill the program being debugged.*" "y"
|
|
|
|
gdb_test "inferior 1" ".*Switching to inferior 1.*"
|
|
|
|
gdb_test "kill" "" "kill inferior 1" "Kill the program being debugged.*" "y"
|
testsuite: tcl exec& -> 'kill -9 $pid' is racy (attach-many-short-lived-thread.exp races and others)
The buildbots show that attach-many-short-lived-thread.exp is racy.
But after staring at debug logs and playing with SystemTap scripts for
a (long) while, I figured out that neither GDB, nor the kernel nor the
test's program itself are at fault.
The problem is simply that the testsuite machinery is currently
subject to PID-reuse races. The attach-many-short-lived-threads.c
test program just happens to be much more susceptible to trigger this
race because threads and processes share the same number space on
Linux, and the test spawns many many short lived threads in
succession, thus enlarging the race window a lot.
Part of the problem is that several tests spawn processes with "exec&"
(in order to test the "attach" command) , and then at the end of the
test, to make sure things are cleaned up, issue a 'remote_spawn "kill
-p $testpid"'. Since with tcl's "exec&", tcl itself is responsible
for reaping the process's exit status, when we go kill the process,
testpid may have already exited _and_ its status may have (and often
has) been reaped already. Thus it can happen that another process
meanwhile reuses $testpid, and that "kill" command kills the wrong
process... Frequently, that happens to be
attach-many-short-lived-thread, but this explains other test's races
as well.
In the attach-many-short-lived-threads test, it sometimes manifests
like this:
(gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads
Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done.
(gdb) Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb
attach 5940
Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940
warning: process 5940 is a zombie - the process has already terminated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ptrace: Operation not permitted.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
info threads
No threads.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
set breakpoint always-inserted on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on
Other times the process dies while the test is ongoing (the process is
ptrace-stopped):
(gdb) print again = 1
Cannot access memory at address 0x6020cc
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior
(Recall that on Linux, SIGKILL is not interceptable)
And other times it dies just while we're detaching:
$4 = 319
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left
detach
Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach
GDB mishandles the latter (it should ignore ESRCH while detaching just
like when continuing), but that's another story.
The fix here is to change spawn_wait_for_attach to use Expect's
'spawn' command instead of Tcl's 'exec&' to spawn programs, because
with spawn we control when to wait for/reap the process. That allows
killing the process by PID without being subject to pid-reuse races,
because even if the process is already dead, the kernel won't reuse
the process's PID until the zombie is reaped.
The other part of the problem lies in DejaGnu itself, unfortunately.
I have occasionally seen tests (attach-many-short-lived-threads
included, but not only that one) die with a random inexplicable
SIGTERM too, and that too is caused by the same reason, except that in
that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp:
exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &"
...
catch "wait -i $shell_id"
Even if the program exits promptly, that whole cascade of kills
carries on in the background, thus potentially killing the poor
process that manages to reuse $pid...
I sent a fix for that to the DejaGnu list:
http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html
With both patches in place, I haven't seen
attach-many-short-lived-threads.exp fail again.
Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver.
gdb/testsuite/ChangeLog:
2015-07-31 Pedro Alves <palves@redhat.com>
* gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id.
Use spawn_id_get_pid. Wait for spawn id after eof. Use
kill_wait_spawned_process instead of explicit "kill -9".
* gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach
returning a spawn id instead of a pid. Use spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/attach-twice.exp: Likewise.
* gdb.base/attach.exp: Likewise.
(do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and
gdb_test_multiple.
* gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach
returning a spawn id instead of a pid. Use spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/valgrind-infcall.exp: Likewise.
* gdb.multi/multi-attach.exp: Likewise.
* gdb.python/py-prompt.exp: Likewise.
* gdb.python/py-sync-interp.exp: Likewise.
* gdb.server/ext-attach.exp: Likewise.
* gdb.threads/attach-into-signal.exp (corefunc): Use
spawn_wait_for_attach, spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.threads/attach-many-short-lived-threads.exp: Adjust to
spawn_wait_for_attach returning a spawn id instead of a pid. Use
spawn_id_get_pid and kill_wait_spawned_process.
* gdb.threads/attach-stopped.exp (corefunc): Use
spawn_wait_for_attach, spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/break-interp.exp: Rename $res to $test_spawn_id.
Use spawn_id_get_pid. Wait for spawn id after eof. Use
kill_wait_spawned_process instead of explicit "kill -9".
* lib/gdb.exp (can_spawn_for_attach): Adjust comment.
(kill_wait_spawned_process, spawn_id_get_pid): New procedures.
(spawn_wait_for_attach): Use spawn instead of exec to spawn
processes. Don't map cygwin/windows pids here. Now returns a
spawn id list.
2015-07-31 19:06:24 +00:00
|
|
|
|
|
|
|
kill_wait_spawned_process $test_spawn_id1
|
|
|
|
kill_wait_spawned_process $test_spawn_id2
|