2c8c5d375e
The buildbots show that attach-many-short-lived-thread.exp is racy. But after staring at debug logs and playing with SystemTap scripts for a (long) while, I figured out that neither GDB, nor the kernel nor the test's program itself are at fault. The problem is simply that the testsuite machinery is currently subject to PID-reuse races. The attach-many-short-lived-threads.c test program just happens to be much more susceptible to trigger this race because threads and processes share the same number space on Linux, and the test spawns many many short lived threads in succession, thus enlarging the race window a lot. Part of the problem is that several tests spawn processes with "exec&" (in order to test the "attach" command) , and then at the end of the test, to make sure things are cleaned up, issue a 'remote_spawn "kill -p $testpid"'. Since with tcl's "exec&", tcl itself is responsible for reaping the process's exit status, when we go kill the process, testpid may have already exited _and_ its status may have (and often has) been reaped already. Thus it can happen that another process meanwhile reuses $testpid, and that "kill" command kills the wrong process... Frequently, that happens to be attach-many-short-lived-thread, but this explains other test's races as well. In the attach-many-short-lived-threads test, it sometimes manifests like this: (gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done. (gdb) Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb attach 5940 Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940 warning: process 5940 is a zombie - the process has already terminated ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ptrace: Operation not permitted. (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach info threads No threads. (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads set breakpoint always-inserted on (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on Other times the process dies while the test is ongoing (the process is ptrace-stopped): (gdb) print again = 1 Cannot access memory at address 0x6020cc (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior (Recall that on Linux, SIGKILL is not interceptable) And other times it dies just while we're detaching: $4 = 319 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left detach Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach GDB mishandles the latter (it should ignore ESRCH while detaching just like when continuing), but that's another story. The fix here is to change spawn_wait_for_attach to use Expect's 'spawn' command instead of Tcl's 'exec&' to spawn programs, because with spawn we control when to wait for/reap the process. That allows killing the process by PID without being subject to pid-reuse races, because even if the process is already dead, the kernel won't reuse the process's PID until the zombie is reaped. The other part of the problem lies in DejaGnu itself, unfortunately. I have occasionally seen tests (attach-many-short-lived-threads included, but not only that one) die with a random inexplicable SIGTERM too, and that too is caused by the same reason, except that in that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp: exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &" ... catch "wait -i $shell_id" Even if the program exits promptly, that whole cascade of kills carries on in the background, thus potentially killing the poor process that manages to reuse $pid... I sent a fix for that to the DejaGnu list: http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html With both patches in place, I haven't seen attach-many-short-lived-threads.exp fail again. Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver. gdb/testsuite/ChangeLog: 2015-07-31 Pedro Alves <palves@redhat.com> * gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id. Use spawn_id_get_pid. Wait for spawn id after eof. Use kill_wait_spawned_process instead of explicit "kill -9". * gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/attach-twice.exp: Likewise. * gdb.base/attach.exp: Likewise. (do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and gdb_test_multiple. * gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/valgrind-infcall.exp: Likewise. * gdb.multi/multi-attach.exp: Likewise. * gdb.python/py-prompt.exp: Likewise. * gdb.python/py-sync-interp.exp: Likewise. * gdb.server/ext-attach.exp: Likewise. * gdb.threads/attach-into-signal.exp (corefunc): Use spawn_wait_for_attach, spawn_id_get_pid and kill_wait_spawned_process. * gdb.threads/attach-many-short-lived-threads.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.threads/attach-stopped.exp (corefunc): Use spawn_wait_for_attach, spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/break-interp.exp: Rename $res to $test_spawn_id. Use spawn_id_get_pid. Wait for spawn id after eof. Use kill_wait_spawned_process instead of explicit "kill -9". * lib/gdb.exp (can_spawn_for_attach): Adjust comment. (kill_wait_spawned_process, spawn_id_get_pid): New procedures. (spawn_wait_for_attach): Use spawn instead of exec to spawn processes. Don't map cygwin/windows pids here. Now returns a spawn id list.
476 lines
14 KiB
Text
476 lines
14 KiB
Text
# Copyright 1997-2015 Free Software Foundation, Inc.
|
|
|
|
# This program is free software; you can redistribute it and/or modify
|
|
# it under the terms of the GNU General Public License as published by
|
|
# the Free Software Foundation; either version 3 of the License, or
|
|
# (at your option) any later version.
|
|
#
|
|
# This program is distributed in the hope that it will be useful,
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
# GNU General Public License for more details.
|
|
#
|
|
# You should have received a copy of the GNU General Public License
|
|
# along with this program. If not, see <http://www.gnu.org/licenses/>. */
|
|
|
|
|
|
# On HP-UX 11.0, this test is causing a process running the program
|
|
# "attach" to be left around spinning. Until we figure out why, I am
|
|
# commenting out the test to avoid polluting tiamat (our 11.0 nightly
|
|
# test machine) with these processes. RT
|
|
#
|
|
# Setting the magic bit in the target app should work. I added a
|
|
# "kill", and also a test for the R3 register warning. JB
|
|
if { [istarget "hppa*-*-hpux*"] } {
|
|
return 0
|
|
}
|
|
|
|
if {![can_spawn_for_attach]} {
|
|
return 0
|
|
}
|
|
|
|
standard_testfile attach.c attach2.c
|
|
set binfile2 ${binfile}2
|
|
set escapedbinfile [string_to_regexp $binfile]
|
|
|
|
#execute_anywhere "rm -f ${binfile} ${binfile2}"
|
|
remote_exec build "rm -f ${binfile} ${binfile2}"
|
|
# For debugging this test
|
|
#
|
|
#log_user 1
|
|
|
|
# build the first test case
|
|
#
|
|
if { [gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
|
|
untested attach.exp
|
|
return -1
|
|
}
|
|
|
|
# Build the in-system-call test
|
|
|
|
if { [gdb_compile "${srcdir}/${subdir}/${srcfile2}" "${binfile2}" executable {debug}] != "" } {
|
|
untested attach.exp
|
|
return -1
|
|
}
|
|
|
|
if [get_compiler_info] {
|
|
return -1
|
|
}
|
|
|
|
proc do_attach_tests {} {
|
|
global gdb_prompt
|
|
global binfile
|
|
global escapedbinfile
|
|
global srcfile
|
|
global testfile
|
|
global subdir
|
|
global timeout
|
|
|
|
# Figure out a regular expression that will match the sysroot,
|
|
# noting that the default sysroot is "target:", and also noting
|
|
# that GDB will strip "target:" from the start of filenames when
|
|
# operating on the local filesystem
|
|
set sysroot ""
|
|
set test "show sysroot"
|
|
gdb_test_multiple $test $test {
|
|
-re "The current system root is \"(.*)\"\..*${gdb_prompt} $" {
|
|
set sysroot $expect_out(1,string)
|
|
}
|
|
}
|
|
regsub "^target:" "$sysroot" "(target:)?" sysroot
|
|
|
|
# Start the program running and then wait for a bit, to be sure
|
|
# that it can be attached to.
|
|
|
|
set test_spawn_id [spawn_wait_for_attach $binfile]
|
|
set testpid [spawn_id_get_pid $test_spawn_id]
|
|
|
|
# Verify that we cannot attach to nonsense.
|
|
|
|
set test "attach to nonsense is prohibited"
|
|
gdb_test_multiple "attach abc" "$test" {
|
|
-re "Illegal process-id: abc\\.\r\n$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process .*couldn't open /proc file.*$gdb_prompt $" {
|
|
# Response expected from /proc-based systems.
|
|
pass "$test"
|
|
}
|
|
-re "Can't attach to process..*$gdb_prompt $" {
|
|
# Response expected on Cygwin
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*$gdb_prompt $" {
|
|
fail "$test (bogus pid allowed)"
|
|
}
|
|
}
|
|
|
|
# Verify that we cannot attach to nonsense even if its initial part is
|
|
# a valid PID.
|
|
|
|
set test "attach to digits-starting nonsense is prohibited"
|
|
gdb_test_multiple "attach ${testpid}x" "$test" {
|
|
-re "Illegal process-id: ${testpid}x\\.\r\n$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process .*couldn't open /proc file.*$gdb_prompt $" {
|
|
# Response expected from /proc-based systems.
|
|
pass "$test"
|
|
}
|
|
-re "Can't attach to process..*$gdb_prompt $" {
|
|
# Response expected on Cygwin
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*$gdb_prompt $" {
|
|
fail "$test (bogus pid allowed)"
|
|
}
|
|
}
|
|
|
|
# Verify that we cannot attach to what appears to be a valid
|
|
# process ID, but is a process that doesn't exist. Traditionally,
|
|
# most systems didn't have a process with ID 0, so we take that as
|
|
# the default. However, there are a few exceptions.
|
|
|
|
set boguspid 0
|
|
if { [istarget "*-*-*bsd*"] } {
|
|
# In FreeBSD 5.0, PID 0 is used for "swapper". Use -1 instead
|
|
# (which should have the desired effect on any version of
|
|
# FreeBSD, and probably other *BSD's too).
|
|
set boguspid -1
|
|
}
|
|
set test "attach to nonexistent process is prohibited"
|
|
gdb_test_multiple "attach $boguspid" "$test" {
|
|
-re "Attaching to.*, process $boguspid.*No such process.*$gdb_prompt $" {
|
|
# Response expected on ptrace-based systems (i.e. HP-UX 10.20).
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process $boguspid failed.*Hint.*$gdb_prompt $" {
|
|
# Response expected on ttrace-based systems (i.e. HP-UX 11.0).
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process $boguspid.*denied.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process $boguspid.*not permitted.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process .*couldn't open /proc file.*$gdb_prompt $" {
|
|
# Response expected from /proc-based systems.
|
|
pass "$test"
|
|
}
|
|
-re "Can't attach to process..*$gdb_prompt $" {
|
|
# Response expected on Cygwin
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*, process $boguspid.*failed.*$gdb_prompt $" {
|
|
# Response expected on the extended-remote target.
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
# Verify that we can attach to the process by first giving its
|
|
# executable name via the file command, and using attach with the
|
|
# process ID.
|
|
|
|
# (Actually, the test system appears to do this automatically for
|
|
# us. So, we must also be prepared to be asked if we want to
|
|
# discard an existing set of symbols.)
|
|
|
|
set test "set file, before attach1"
|
|
gdb_test_multiple "file $binfile" "$test" {
|
|
-re "Load new symbol table from.*y or n. $" {
|
|
gdb_test "y" "Reading symbols from $escapedbinfile\.\.\.*done." \
|
|
"$test (re-read)"
|
|
}
|
|
-re "Reading symbols from $escapedbinfile\.\.\.*done.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
set test "attach1, after setting file"
|
|
gdb_test_multiple "attach $testpid" "$test" {
|
|
-re "Attaching to program.*`?$escapedbinfile'?, process $testpid.*main.*at .*$srcfile:.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to program.*`?$escapedbinfile\.exe'?, process $testpid.*\[Switching to thread $testpid\..*\].*$gdb_prompt $" {
|
|
# Response expected on Cygwin
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
# Verify that we can "see" the variable "should_exit" in the
|
|
# program, and that it is zero.
|
|
|
|
gdb_test "print should_exit" " = 0" "after attach1, print should_exit"
|
|
|
|
# Detach the process.
|
|
|
|
gdb_test "detach" \
|
|
"Detaching from program: .*$escapedbinfile, process $testpid" \
|
|
"attach1 detach"
|
|
|
|
# Wait a bit for gdb to finish detaching
|
|
|
|
exec sleep 5
|
|
|
|
# Purge the symbols from gdb's brain. (We want to be certain the
|
|
# next attach, which won't be preceded by a "file" command, is
|
|
# really getting the executable file without our help.)
|
|
|
|
set old_timeout $timeout
|
|
set timeout 15
|
|
set test "attach1, purging symbols after detach"
|
|
gdb_test_multiple "file" "$test" {
|
|
-re "No executable file now.*Discard symbol table.*y or n. $" {
|
|
gdb_test "y" "No symbol file now." "$test"
|
|
}
|
|
}
|
|
set timeout $old_timeout
|
|
|
|
# Verify that we can attach to the process just by giving the
|
|
# process ID.
|
|
|
|
set test "attach2, with no file"
|
|
set found_exec_file 0
|
|
gdb_test_multiple "attach $testpid" "$test" {
|
|
-re "Attaching to process $testpid.*Load new symbol table from \"$sysroot$escapedbinfile\.exe\".*y or n. $" {
|
|
# On Cygwin, the DLL's symbol tables are loaded prior to the
|
|
# executable's symbol table. This in turn always results in
|
|
# asking the user for actually loading the symbol table of the
|
|
# executable.
|
|
gdb_test "y" "Reading symbols from $sysroot$escapedbinfile\.\.\.*done." \
|
|
"$test (reset file)"
|
|
|
|
set found_exec_file 1
|
|
}
|
|
-re "Attaching to process $testpid.*Reading symbols from $sysroot$escapedbinfile.*main.*at .*$gdb_prompt $" {
|
|
pass "$test"
|
|
set found_exec_file 1
|
|
}
|
|
}
|
|
|
|
if {$found_exec_file == 0} {
|
|
set test "load file manually, after attach2"
|
|
gdb_test_multiple "file $binfile" "$test" {
|
|
-re "A program is being debugged already..*Are you sure you want to change the file.*y or n. $" {
|
|
gdb_test "y" "Reading symbols from $escapedbinfile\.\.\.*done." \
|
|
"$test (re-read)"
|
|
}
|
|
-re "Reading symbols from $escapedbinfile\.\.\.*done.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
}
|
|
}
|
|
|
|
# Verify that we can modify the variable "should_exit" in the
|
|
# program.
|
|
|
|
gdb_test_no_output "set should_exit=1" "after attach2, set should_exit"
|
|
|
|
# Verify that the modification really happened.
|
|
|
|
gdb_breakpoint [gdb_get_line_number "postloop"] temporary
|
|
gdb_continue_to_breakpoint "postloop" ".* postloop .*"
|
|
|
|
# Allow the test process to exit, to cleanup after ourselves.
|
|
|
|
gdb_continue_to_end "after attach2, exit"
|
|
|
|
# Make sure we don't leave a process around to confuse
|
|
# the next test run (and prevent the compile by keeping
|
|
# the text file busy), in case the "set should_exit" didn't
|
|
# work.
|
|
|
|
kill_wait_spawned_process $test_spawn_id
|
|
|
|
set test_spawn_id [spawn_wait_for_attach $binfile]
|
|
set testpid [spawn_id_get_pid $test_spawn_id]
|
|
|
|
# Verify that we can attach to the process, and find its a.out
|
|
# when we're cd'd to some directory that doesn't contain the
|
|
# a.out. (We use the source path set by the "dir" command.)
|
|
|
|
gdb_test "dir [standard_output_file {}]" "Source directories searched: .*" \
|
|
"set source path"
|
|
|
|
gdb_test "cd /tmp" "Working directory /tmp." \
|
|
"cd away from process working directory"
|
|
|
|
# Explicitly flush out any knowledge of the previous attachment.
|
|
|
|
set test "before attach3, flush symbols"
|
|
gdb_test_multiple "symbol-file" "$test" {
|
|
-re "Discard symbol table from.*y or n. $" {
|
|
gdb_test "y" "No symbol file now." \
|
|
"$test"
|
|
}
|
|
-re "No symbol file now.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
gdb_test "exec" "No executable file now." \
|
|
"before attach3, flush exec"
|
|
|
|
gdb_test "attach $testpid" \
|
|
"Attaching to process $testpid.*Reading symbols from $sysroot$escapedbinfile.*main.*at .*" \
|
|
"attach when process' a.out not in cwd"
|
|
|
|
set test "after attach3, exit"
|
|
gdb_test "kill" \
|
|
"" \
|
|
"$test" \
|
|
"Kill the program being debugged.*y or n. $" \
|
|
"y"
|
|
|
|
# Another "don't leave a process around"
|
|
kill_wait_spawned_process $test_spawn_id
|
|
}
|
|
|
|
proc do_call_attach_tests {} {
|
|
global gdb_prompt
|
|
global binfile2
|
|
|
|
set test_spawn_id [spawn_wait_for_attach $binfile2]
|
|
set testpid [spawn_id_get_pid $test_spawn_id]
|
|
|
|
# Attach
|
|
|
|
gdb_test "file $binfile2" ".*" "force switch to gdb64, if necessary"
|
|
set test "attach call"
|
|
gdb_test_multiple "attach $testpid" "$test" {
|
|
-re "warning: reading register.*I.*O error.*$gdb_prompt $" {
|
|
fail "$test (read register error)"
|
|
}
|
|
-re "Attaching to.*process $testpid.*libc.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
-re "Attaching to.*process $testpid.*\[Switching to thread $testpid\..*\].*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
# See if other registers are problems
|
|
|
|
set test "info other register"
|
|
gdb_test_multiple "i r r3" "$test" {
|
|
-re "warning: reading register.*$gdb_prompt $" {
|
|
fail "$test"
|
|
}
|
|
-re "r3.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
# Get rid of the process
|
|
|
|
gdb_test "p should_exit = 1"
|
|
gdb_continue_to_end
|
|
|
|
# Be paranoid
|
|
|
|
kill_wait_spawned_process $test_spawn_id
|
|
}
|
|
|
|
proc do_command_attach_tests {} {
|
|
global gdb_prompt
|
|
global binfile
|
|
global verbose
|
|
global GDB
|
|
global INTERNAL_GDBFLAGS
|
|
global GDBFLAGS
|
|
|
|
if ![isnative] then {
|
|
unsupported "command attach test"
|
|
return 0
|
|
}
|
|
|
|
set test_spawn_id [spawn_wait_for_attach $binfile]
|
|
set testpid [spawn_id_get_pid $test_spawn_id]
|
|
|
|
gdb_exit
|
|
|
|
set res [gdb_spawn_with_cmdline_opts "--pid=$testpid"]
|
|
set test "starting with --pid"
|
|
gdb_test_multiple "" $test {
|
|
-re "Reading symbols from.*$gdb_prompt $" {
|
|
pass "$test"
|
|
}
|
|
}
|
|
|
|
# Get rid of the process
|
|
kill_wait_spawned_process $test_spawn_id
|
|
}
|
|
|
|
# Test ' gdb --pid PID -ex "run" '. GDB used to have a bug where
|
|
# "run" would run before the attach finished - PR17347.
|
|
|
|
proc test_command_line_attach_run {} {
|
|
global gdb_prompt
|
|
global binfile
|
|
|
|
if ![isnative] then {
|
|
unsupported "commandline attach run test"
|
|
return 0
|
|
}
|
|
|
|
with_test_prefix "cmdline attach run" {
|
|
set test_spawn_id [spawn_wait_for_attach $binfile]
|
|
set testpid [spawn_id_get_pid $test_spawn_id]
|
|
|
|
set test "run to prompt"
|
|
gdb_exit
|
|
|
|
set res [gdb_spawn_with_cmdline_opts \
|
|
"-iex \"set height 0\" -iex \"set width 0\" --pid=$testpid -ex \"start\""]
|
|
if { $res != 0} {
|
|
fail $test
|
|
kill_wait_spawned_process $test_spawn_id
|
|
return $res
|
|
}
|
|
gdb_test_multiple "" $test {
|
|
-re {Attaching to.*Start it from the beginning\? \(y or n\) } {
|
|
pass $test
|
|
}
|
|
}
|
|
|
|
send_gdb "y\n"
|
|
|
|
set test "run to main"
|
|
gdb_test_multiple "" $test {
|
|
-re "Temporary breakpoint .* main .*$gdb_prompt $" {
|
|
pass $test
|
|
}
|
|
}
|
|
|
|
# Get rid of the process
|
|
kill_wait_spawned_process $test_spawn_id
|
|
}
|
|
}
|
|
|
|
# Start with a fresh gdb
|
|
|
|
gdb_exit
|
|
gdb_start
|
|
gdb_reinitialize_dir $srcdir/$subdir
|
|
gdb_load ${binfile}
|
|
|
|
# This is a test of gdb's ability to attach to a running process.
|
|
|
|
do_attach_tests
|
|
|
|
# Test attaching when the target is inside a system call
|
|
|
|
gdb_exit
|
|
gdb_start
|
|
|
|
gdb_reinitialize_dir $srcdir/$subdir
|
|
do_call_attach_tests
|
|
|
|
# Test "gdb --pid"
|
|
|
|
do_command_attach_tests
|
|
|
|
test_command_line_attach_run
|
|
|
|
return 0
|