Commit graph

309 commits

Author SHA1 Message Date
Pedro Alves
7759842763 PR gdb/18717: internal error if non-leader thread exits process
If a non-leader thread exits the process while all other threads are
ptrace-stopped, native gdb fails an assertion.  The test added by this
commit catches it:

 /home/pedro/gdb/mygit/build/../src/gdb/linux-nat.c:3198: internal-error: linux_nat_filter_event: Assertion `lp->resumed' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n)
 FAIL: gdb.threads/non-leader-exit-process.exp: program exits normally (GDB internal error)

The fix is just to remove the assertion.

With that out of the way, neither GDB not GDBserver handle this
perfectly though, so I'm adding a KFAIL:

 (gdb) continue
 Continuing.
 [Thread 0x7ffff7fc0700 (LWP 15350) exited]
 No unwaited-for children left.
 Couldn't get registers: No such process.
 (gdb) KFAIL: gdb.threads/non-ldr-exit.exp: program exits normally (PRMS: gdb/18717)

gdb/ChangeLog:
2015-07-24  Pedro Alves  <palves@redhat.com>

	PR gdb/18717
	* linux-nat.c (linux_nat_filter_event): Don't assert that the lwp
	is resumed, and extend the debug log.

gdb/testsuite/ChangeLog:
2015-07-24  Pedro Alves  <palves@redhat.com>

	PR gdb/18717
	* gdb.threads/non-ldr-exit.c: New file.
	* gdb.threads/non-ldr-exit.exp: New file.
2015-07-24 17:49:17 +01:00
Pedro Alves
28bf096c62 PR threads/18127 - threads spawned by infcall end up stuck in "running" state
Refs:
 https://sourceware.org/ml/gdb/2015-03/msg00024.html
 https://sourceware.org/ml/gdb/2015-06/msg00005.html

On GNU/Linux, if an infcall spawns a thread, that thread ends up with
stuck running state.  This happens because:

 - when linux-nat.c detects a new thread, it marks them as running,
   and does not report anything to the core.

 - we skip finish_thread_state when the thread that is running the
   infcall stops.

As result, that new thread ends up with stuck "running" state, even
though it really is stopped.

On Windows, _all_ threads end up stuck in running state, not just the
one that was spawned.  That happens because when a new thread is
detected, unlike linux-nat.c, windows-nat.c reports
TARGET_WAITKIND_SPURIOUS to infrun.  It's the fact that that event
does not cause a user-visible stop that triggers the problem.  When
the target is re-resumed, we call set_running with a wildcard ptid,
which marks all thread as running.  That set_running is not suppressed
because the (leader) thread being resumed does not have in_infcall
set.  Later, when the infcall finally finishes successfully, nothing
marks all threads back to stopped.

We can trigger the same problem on all targets by having a thread
other than the one that is running the infcall report a breakpoint hit
to infrun, and then have that breakpoint not cause a stop.  That's
what the included test does.

The fix is to stop GDB from suppressing the set_running calls while
doing an infcall, and then set the threads back to stopped when the
call finishes, iff they were originally stopped before the infcall
started.  (Note the MI *running/*stopped event suppression isn't
affected.)

Tested on x86_64 GNU/Linux.

gdb/ChangeLog:
2015-06-29  Pedro Alves  <palves@redhat.com>

	PR threads/18127
	* infcall.c (run_inferior_call): On infcall success, if the thread
	was marked stopped before, reset it back to stopped.
	* infrun.c (resume): Don't suppress the set_running calls when
	doing an infcall.
	(normal_stop): Only discard the finish_thread_state cleanup if the
	infcall succeeded.

gdb/testsuite/ChangeLog:
2015-06-29  Pedro Alves  <palves@redhat.com>

	PR threads/18127
	* gdb.threads/hand-call-new-thread.c: New file.
	* gdb.threads/hand-call-new-thread.c: New file.
2015-06-29 16:07:57 +01:00
Pedro Alves
9ee417720b Cleanup signal-while-stepping-over-bp-other-thread.exp
gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/signal-while-stepping-over-bp-other-thread.exp: Use
	gdb_test_sequence and gdb_assert.
2015-04-10 19:49:00 +01:00
Pedro Alves
07473109e1 step-over-trips-on-watchpoint.exp: Don't put addresses in test messages
Diffing test results, I noticed:

 -PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b *0x0000000000400811 thread 1
 +PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b *0x00000000004007d1 thread 1

gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Use
	test messages that don't include the breakpoint address.
2015-04-10 19:23:24 +01:00
Pedro Alves
c79d856c88 Test step-over-{lands-on-breakpoint|trips-on-watchpoint}.exp with displaced stepping
These tests exercise the infrun.c:proceed code that needs to know to
start new step overs (along with switch_back_to_stepped_thread, etc.).
That code is tricky to get right in the multitude of possible
combinations (at least):

 (native | remote)
  X (all-stop | all-stop-but-target-always-in-non-stop)
  X (displaced-stepping | in-line step-over).

The first two above are properties of the target, but the different
step-over-breakpoint methods should work with any target that supports
them.  This patch makes sure we always test both methods on all
targets.

Tested on x86-64 Fedora 20.

gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-lands-on-breakpoint.exp (do_test): New
	procedure, factored out from ...
	(top level): ... here.  Add "set displaced-stepping" testing axis.
	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): New
	parameter "displaced".  Use it.
	(top level): Use foreach and add "set displaced-stepping" testing
	axis.
2015-04-10 13:31:59 +01:00
Pedro Alves
ebc90b50ce Make gdb.threads/step-over-trips-on-watchpoint.exp effective on !x86
This test is currently failing like this on (at least) PPC64 and s390x:

 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: next: next
 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: step: step
 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: next: next

gdb.log:

 (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off
 step
 wait_threads () at ../../../src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49
 49        return 1; /* in wait_threads */
 (gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step

The problem is that the test assumes that both the "watch_me = 1;" and
the "other = 1;" lines compile to a single instruction each, which
happens to be true on x86, but no necessarily true everywhere else.
The result is that the test doesn't really test what it wants to test.

Fix it by looking for the instruction that triggers the watchpoint.

gdb/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-trips-on-watchpoint.c (child_function):
	Remove comment.
	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Find
	both the address of the instruction that triggers the watchpoint
	and the address of the instruction immediately after, and use
	those addresses for the test.  Fix comment.
2015-04-10 13:11:32 +01:00
Pedro Alves
8d707a12ef gdb/18216: displaced step+deliver signal, a thread needs step-over, crash
The problem is that with hardware step targets and displaced stepping,
"signal FOO" when stopped at a breakpoint steps the breakpoint
instruction at the same time it delivers a signal.  This results in
tp->stepped_breakpoint set, but no step-resume breakpoint set.  When
the next stop event arrives, GDB crashes.  Irrespective of whether we
should do something more/different to step past the breakpoint in this
scenario (e.g., PR 18225), it's just wrong to assume there'll be a
step-resume breakpoint set (and was not the original intention).

gdb/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	PR gdb/18216
	* infrun.c (process_event_stop_test): Don't assume a step-resume
	is set if tp->stepped_breakpoint is true.

gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	PR gdb/18216
	* gdb.threads/multiple-step-overs.exp: Remove expected eof.
2015-04-10 10:36:23 +01:00
Pedro Alves
f3770638ca Add test for PR18214 and PR18216 - multiple step-overs with queued signals
Both PRs are triggered by the same use case.

PR18214 is about software single-step targets.  On those, the 'resume'
code that detects that we're stepping over a breakpoint and delivering
a signal at the same time:

  /* Currently, our software single-step implementation leads to different
     results than hardware single-stepping in one situation: when stepping
     into delivering a signal which has an associated signal handler,
     hardware single-step will stop at the first instruction of the handler,
     while software single-step will simply skip execution of the handler.
...
     Fortunately, we can at least fix this particular issue.  We detect
     here the case where we are about to deliver a signal while software
     single-stepping with breakpoints removed.  In this situation, we
     revert the decisions to remove all breakpoints and insert single-
     step breakpoints, and instead we install a step-resume breakpoint
     at the current address, deliver the signal without stepping, and
     once we arrive back at the step-resume breakpoint, actually step
     over the breakpoint we originally wanted to step over.  */

doesn't handle the case of _another_ thread also needing to step over
a breakpoint.  Because the other thread is just resumed at the PC
where it had stopped and a breakpoint is still inserted there, the
thread immediately re-traps the same breakpoint.  This test exercises
that.  On software single-step targets, it fails like this:

 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr2: continue to sigusr1_handler

gdb.log (simplified):

 (gdb) continue
 Continuing.

 Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
 66            callme (); /* set breakpoint thread 2 here */
 (gdb) thread 3
 (gdb) queue-signal SIGUSR1
 (gdb) thread 1
 [Switching to thread 1 (Thread 0x7ffff7fc1740 (LWP 24824))]
 #0  main () at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:106
 106       wait_threads (); /* set wait-threads breakpoint here */
 (gdb) break sigusr1_handler
 Breakpoint 5 at 0x400837: file src/gdb/testsuite/gdb.threads/multiple-step-overs.c, line 31.
 (gdb) continue
 Continuing.
 [Switching to Thread 0x7ffff7fc0700 (LWP 24828)]

 Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
 66            callme (); /* set breakpoint thread 2 here */
 (gdb) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler


For good measure, I made the test try displaced stepping too.  And
then I found it crashes GDB on x86-64 (a hardware step target), but
only when displaced stepping... :

 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr1: continue to sigusr1_handler (PRMS: gdb/18216)
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr2: continue to sigusr1_handler (PRMS: gdb/18216)
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr3: continue to sigusr1_handler (PRMS: gdb/18216)

 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
 4964          if (sr_bp->loc->permanent
 Setting up the environment for debugging gdb.
 Breakpoint 1 at 0x79fcfc: file src/gdb/common/errors.c, line 54.
 Breakpoint 2 at 0x50a26c: file src/gdb/cli/cli-cmds.c, line 217.
 (top-gdb) p sr_bp
 $1 = (struct breakpoint *) 0x0
 (top-gdb) bt
 #0  0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
 #1  0x000000000062a1af in handle_signal_stop (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4715
 #2  0x0000000000629097 in handle_inferior_event (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4165
 #3  0x0000000000627482 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3298
 #4  0x000000000064ad7b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
 #5  0x00000000004c375f in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4658
 #6  0x0000000000648c47 in handle_file_event (file_ptr=0x2e0eaa0, ready_mask=1) at src/gdb/event-loop.c:658

The all-stop-non-stop series fixes this, but meanwhile, this augments
the multiple-step-overs.exp test to cover this, KFAILed.

gdb/testsuite/ChangeLog:
2015-04-08  Pedro Alves  <palves@redhat.com>

	PR gdb/18214
	PR gdb/18216
	* gdb.threads/multiple-step-overs.c (sigusr1_handler): New
	function.
	(main): Install it as SIGUSR1 handler.
	* gdb.threads/multiple-step-overs.exp (setup): Remove 'prefix'
	parameter.  Always use "setup" as prefix.  Toggle "set
	displaced-stepping" off/on depending on global.  Don't switch to
	thread 1 here.
	(top level): Add displaced stepping "off/on" test axis.  Update
	"setup" calls.  Wrap each subtest with with_test_prefix.  Test
	continuing with a queued signal in each thread.
2015-04-08 19:59:03 +01:00
Yao Qi
337532fab1 Properly set alarm value in gdb.threads/non-stop-fair-events.exp
Nowadays, the alarm value is 60, and alarm is generated on some slow
boards.  This patch is to pass DejaGNU timeout value to the program,
and move the alarm call before going to infinite loop.  If any thread
has activities, the alarm is reset.

gdb/testsuite:

2015-04-07  Yao Qi  <yao.qi@linaro.org>

	* gdb.threads/non-stop-fair-events.c (SECONDS): New macro.
	(child_function): Call alarm.
	(main): Move call to alarm into the loop.
	* gdb.threads/non-stop-fair-events.exp: Build program with
	-DTIMEOUT=$timeout.
2015-04-07 11:30:07 +01:00
Yao Qi
cafda5977a kfail two tests in no-unwaited-for-left.exp for remote target
I see these two fails in no-unwaited-for-left.exp in remote testing
for aarch64-linux target.

...
continue
Continuing.
warning: Remote failure reply: E.No unwaited-for children left.

[Thread 1084] #2 stopped.
(gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when thread 2 exits

....
continue
Continuing.
warning: Remote failure reply: E.No unwaited-for children left.

[Thread 1081] #1 stopped.
(gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when the main thread exits

I checked the gdb.log on buildbot, and find that these two fails also
appear on Debian-i686-native-extended-gdbserver and Fedora-ppc64be-native-gdbserver-m64.
I recall that they are about local/remote parity, and related RSP is missing.
There has been already a PR 14618 about it.  This patch is to kfail them
on remote target.

gdb/testsuite:

2015-04-02  Yao Qi  <yao.qi@linaro.org>

	* gdb.threads/no-unwaited-for-left.exp: Set up kfail if target
	is remote.
2015-04-02 13:51:31 +01:00
Pedro Alves
a14711808e gdb.threads/manythreads.exp: can't read "test": no such variable
If interrupt_and_wait manages to trigger the FAIL path, we get:

  ERROR OCCURED: can't read "test": no such variable

gdb/testsuite/ChangeLog:
2015-04-01  Pedro Alves  <palves@redhat.com>

	* gdb.threads/manythreads.exp (interrupt_and_wait): Pass $message
	to fail instead of non-existent $test.
2015-04-01 15:30:13 +01:00
Pedro Alves
4eec2deb06 Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:

 (gdb) continue
 Continuing.
 src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)

Here:

 (top-gdb) bt
 #0  internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
     at src/gdb/common/errors.c:54
 #1  0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
 #2  0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
 #3  0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
 #4  0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
 #5  0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
 #6  0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
     at src/gdb/linux-thread-db.c:1525
 #7  0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
 #8  0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
 #9  0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
 #10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
 #11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655

I managed to come up with a test that reliably reproduces this.  It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while.  On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds.  So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-04-01  Pedro Alves  <palves@redhat.com>

	* linux-thread-db.c (record_thread): Readd the thread to gdb's
	list if it was marked exited.

gdb/testsuite/ChangeLog:
2015-04-01  Pedro Alves  <palves@redhat.com>

	* gdb.threads/tid-reuse.c: New file.
	* gdb.threads/tid-reuse.exp: New file.
2015-04-01 13:38:06 +01:00
Pedro Alves
a25d8bf9c5 Fix "thread apply all" with exited threads
I noticed that "thread apply all" sometimes crashes.

The problem is that thread_apply_all_command doesn take exited threads
into account, and we qsort and then walk more elements than there
really ever were put in the array.  Valgrind shows:

 The current thread <Thread ID 3> has terminated.  See `help thread'.
 (gdb) thread apply all p 1

 Thread 1 (Thread 0x7ffff7fc2740 (LWP 29579)):
 $1 = 1
 ==29576== Use of uninitialised value of size 8
 ==29576==    at 0x639CA8: set_thread_refcount (thread.c:1337)
 ==29576==    by 0x5C2C7B: do_my_cleanups (cleanups.c:155)
 ==29576==    by 0x5C2CE8: do_cleanups (cleanups.c:177)
 ==29576==    by 0x63A191: thread_apply_all_command (thread.c:1477)
 ==29576==    by 0x50374D: do_cfunc (cli-decode.c:105)
 ==29576==    by 0x506865: cmd_func (cli-decode.c:1893)
 ==29576==    by 0x7562CB: execute_command (top.c:476)
 ==29576==    by 0x647DA4: command_handler (event-top.c:494)
 ==29576==    by 0x648367: command_line_handler (event-top.c:692)
 ==29576==    by 0x7BF7C9: rl_callback_read_char (callback.c:220)
 ==29576==    by 0x64784C: rl_callback_read_char_wrapper (event-top.c:171)
 ==29576==    by 0x647CB5: stdin_event_handler (event-top.c:432)
 ==29576==
 ...

This can happen easily today as linux-nat.c/linux-thread-db.c are
forgetting to purge non-current exited threads.  But even with that
fixed, we can always do "thread apply all" with an exited thread
selected, which won't be deleted until the user switches to another
thread.  That's what the test added by this commit exercises.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* thread.c (thread_apply_all_command): Take exited threads into
	account.

gdb/testsuite/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdb.threads/no-unwaited-for-left.exp: Test "thread apply all".
2015-03-24 21:01:29 +00:00
Pedro Alves
856e7dd698 Make "set scheduler-locking step" depend on user intention, only
Currently, "set scheduler-locking step" is a bit odd.  The manual
documents it as being optimized for stepping, so that focus of
debugging does not change unexpectedly, but then it says that
sometimes other threads may run, and thus focus may indeed change
unexpectedly...  A user can then be excused to get confused and wonder
why does GDB behave like this.

I don't think a user should have to know about details of how "next"
or whatever other run control command is implemented internally to
understand when does the "scheduler-locking step" setting take effect.

This patch completes a transition that the code has been moving
towards for a while.  It makes "set scheduler-locking step" hold
threads depending on whether the _command_ the user entered was a
stepping command [step/stepi/next/nexti], or not.

Before, GDB could end up locking threads even on "continue" if for
some reason run control decides a thread needs to be single stepped
(e.g., for a software watchpoint).

After, if a "continue" happens to need to single-step for some reason,
we won't lock threads (unless when stepping over a breakpoint,
naturally).  And if a stepping command wants to continue a thread for
bit, like when skipping a function to a step-resume breakpoint, we'll
still lock threads, so focus of debugging doesn't change.

In order to make this work, we need to record in the thread structure
whether what set it running was a stepping command.

(A follow up patch will remove the "step" parameters of 'proceed' and 'resume')

FWIW, Fedora GDB, which defaults to "scheduler-locking step" (mainline
defaults to "off") carries a different patch that goes in this
direction as well.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdbthread.h (struct thread_control_state) <stepping_command>:
	New field.
	* infcmd.c (step_once): Pass step=1 to clear_proceed_status.  Set
	the thread's stepping_command field.
	* infrun.c (resume): Check the thread's stepping_command flag to
	determine which threads should be resumed.  Rename 'entry_step'
	local to user_step.
	(clear_proceed_status_thread): Clear 'stepping_command'.
	(schedlock_applies): Change parameter type to struct thread_info
	pointer.  Adjust.
	(find_thread_needs_step_over): Remove 'step' parameter.  Adjust.
	(switch_back_to_stepped_thread): Adjust calls to
	'schedlock_applies'.
	(_initialize_infrun): Adjust "set scheduler-locking step" help.

gdb/testsuite/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdb.threads/schedlock.exp (test_step): No longer expect that
	"set scheduler-locking step" with "next" over a function call runs
	threads unlocked.

gdb/doc/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdb.texinfo (test_step) <set scheduler-locking step>: No longer
	mention that threads may sometimes run unlocked.
2015-03-24 17:50:31 +00:00
Pedro Alves
8bf3b159e5 gdbserver/Linux: unbreak thread event randomization
Wanting to make sure the new continue-pending-status.exp test tests
both cases of threads 2 and 3 reporting an event, I added counters to
the test, to make it FAIL if events for both threads aren't seen.
Assuming a well behaved backend, and given a reasonable number of
iterations, it should PASS.

However, running that against GNU/Linux gdbserver, I found that
surprisingly, that FAILed.  GDBserver always reported the breakpoint
hit for the same thread.

Turns out that I broke gdbserver's thread event randomization
recently, with git commit 582511be ([gdbserver] linux-low.c: better
starvation avoidance, handle non-stop mode too).  In that commit I
missed that the thread structure also has a status_pending_p field...
The end result was that count_events_callback always returns 0, and
then if no thread is stepping, select_event_lwp always returns the
event thread.  IOW, no randomization is happening at all.  Quite
curious how all the other changes in that patch were sufficient to fix
non-stop-fair-events.exp anyway even with that broken.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/gdbserver/ChangeLog:
2015-03-19 Pedro Alves  <palves@redhat.com>

	* linux-low.c (count_events_callback, select_event_lwp_callback):
	Use the lwp's status_pending_p field, not the thread's.

gdb/testsuite/ChangeLog:
2015-03-19  Pedro Alves  <palves@redhat.com>

	* gdb.threads/continue-pending-status.exp (saw_thread_2)
	(saw_thread_3): New globals.
	(top level): Increment them when an event for the corresponding
	thread is seen.
	(no thread starvation): New test.
2015-03-19 12:38:05 +00:00
Pedro Alves
eb54c8bf08 native/Linux: internal error if resume is short-circuited
If the linux_nat_resume's short-circuits the resume because the
current thread has a pending status, and, a thread with a higher
number was previously stopped for a breakpoint, GDB internal errors,
like:

 /home/pedro/gdb/mygit/src/gdb/linux-nat.c:2590: internal-error: status_callback: Assertion `lp->status != 0' failed.

Fix this by make status_callback bail out earlier.  GDBserver is
already doing the same.

New test added that exercises this.

gdb/ChangeLog:
2015-03-19  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (status_callback): Return early if the LWP has no
	status pending.

gdb/testsuite/ChangeLog:
2015-03-19  Pedro Alves  <palves@redhat.com>

	* gdb.threads/continue-pending-status.c: New file.
	* gdb.threads/continue-pending-status.exp: New file.
2015-03-19 12:26:49 +00:00
Pedro Alves
be9957b82f Fix gdb.threads/thread-specific-bp.exp race
Gary stumbled on this:

 (gdb) PASS: gdb.threads/thread-specific-bp.exp: all-stop: continue to end
 info threads
   Id   Target Id         Frame
 * 1    Thread 0x7ffff7fdb700 (LWP 13717) "thread-specific" end () at /home/gary/work/archer/startswith/src/gdb/testsuite/gdb.threads/thread-specific-bp.c:29
 (gdb) FAIL: gdb.threads/thread-specific-bp.exp: all-stop: thread start is gone
 info breakpoint

The problem is that "...archer/startswith/src..." has a "start" in it,
which matches the too-lax regex in the test.

Rather than tweaking the regex, we can just remove the whole "info
threads", like we removed similar ones in other files -- GDB nowadays
does this implicitly already, so things should work without it.  Thus
removing this even improves testing here a bit.

gdb/testsuite/ChangeLog:
2015-03-04  Pedro Alves  <palves@redhat.com>

	* gdb.threads/thread-specific-bp.exp: Delete "info threads" test.
2015-03-04 17:23:55 +00:00
Pedro Alves
511aee7c39 gdb.threads/clone-thread_db.c: Add missing includes and fix pthread_join call
This fixes:

> gdb compile failed, /gdb/testsuite/gdb.threads/clone-thread_db.c: In function 'main':
> /gdb/testsuite/gdb.threads/clone-thread_db.c:67:3: warning: implicit declaration of function 'alarm' [-Wimplicit-function-declaration]
>    alarm (300);
>    ^
> /gdb/testsuite/gdb.threads/clone-thread_db.c:69:3: warning: implicit declaration of function 'pthread_create' [-Wimplicit-function-declaration]
>    pthread_create (&child, NULL, thread_fn, NULL);
>    ^
> /gdb/testsuite/gdb.threads/clone-thread_db.c:70:3: warning: implicit declaration of function 'pthread_join' [-Wimplicit-function-declaration]
>    pthread_join (child);
>    ^

And then adding the missing headers revealed the pthread_join call was
incorrect.  This probably fixes the crash we see on ppc64be, e.g., at

 https://sourceware.org/ml/gdb-testers/2015-q1/msg04415.html

the logs there show:

 ...
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x3fffb7ff54a0 (LWP 9275)]
 0x00003fffb7f3ce74 in .pthread_join () from /lib64/libpthread.so.0
 (gdb) FAIL: gdb.threads/clone-thread_db.exp: continue to end
 ...

Tested on x86_64 Fedora 20.

gdb/testsuite/
2015-03-04  Pedro Alves  <palves@redhat.com>

	* gdb.threads/clone-thread_db.c: Include unistd.h and pthread.h.
	(main): Pass missing retval argument to pthread_join call.
2015-03-04 09:13:49 +00:00
Pedro Alves
95e50b2723 follow-exec: delete all non-execing threads
This fixes invalid reads Valgrind first caught when debugging against
a GDBserver patched with a series that adds exec events to the remote
protocol.  Like these, using the gdb.threads/thread-execl.exp test:

$ valgrind ./gdb -data-directory=data-directory ./testsuite/gdb.threads/thread-execl  -ex "tar extended-remote :9999" -ex "b thread_execler" -ex "c" -ex "set scheduler-locking on"
...
Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29        if (execl (image, image, NULL) == -1)
(gdb) n
Thread 32509.32509 is executing new program: build/gdb/testsuite/gdb.threads/thread-execl
[New Thread 32509.32532]
==32510== Invalid read of size 4
==32510==    at 0x5AA7D8: delete_breakpoint (breakpoint.c:13989)
==32510==    by 0x6285D3: delete_thread_breakpoint (thread.c:100)
==32510==    by 0x628603: delete_step_resume_breakpoint (thread.c:109)
==32510==    by 0x61622B: delete_thread_infrun_breakpoints (infrun.c:2928)
==32510==    by 0x6162EF: for_each_just_stopped_thread (infrun.c:2958)
==32510==    by 0x616311: delete_just_stopped_threads_infrun_breakpoints (infrun.c:2969)
==32510==    by 0x616C96: fetch_inferior_event (infrun.c:3267)
==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
==32510==  Address 0xcf333e0 is 16 bytes inside a block of size 200 free'd
==32510==    at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==32510==    by 0x77CB74: xfree (common-utils.c:98)
==32510==    by 0x5AA954: delete_breakpoint (breakpoint.c:14056)
==32510==    by 0x5988BD: update_breakpoints_after_exec (breakpoint.c:3765)
==32510==    by 0x61360F: follow_exec (infrun.c:1091)
==32510==    by 0x6186FA: handle_inferior_event (infrun.c:4061)
==32510==    by 0x616C55: fetch_inferior_event (infrun.c:3261)
==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
==32510==
[Switching to Thread 32509.32532]

Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29        if (execl (image, image, NULL) == -1)
(gdb)

The breakpoint in question is the step-resume breakpoint of the
non-main thread, the one that was "next"ed.

The exact same issue can be seen on mainline with native debugging, by
running the thread-execl.exp test in non-stop mode, because the kernel
doesn't report a thread exit event for the execing thread.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-03-02  Pedro Alves  <palves@redhat.com>

	* infrun.c (follow_exec): Delete all threads of the process except
	the event thread.  Extended comments.

gdb/testsuite/ChangeLog:
2015-03-02  Pedro Alves  <palves@redhat.com>

	* gdb.threads/thread-execl.exp (do_test): Handle non-stop.
	(top level): Call do_test with non-stop as well.
2015-03-03 01:25:17 +00:00
Pedro Alves
a47cd6e95a gdb.threads/multi-create-ns-info-thr.exp and native-extended-remote board
The buildbot shows that the new
gdb.threads/multi-create-ns-info-thr.exp test is timing out when
tested with --target=native-extended-remote.  The reason is:

 No breakpoints or watchpoints.
 (gdb) break main
 Breakpoint 1 at 0x10000b00: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 72.
 (gdb) run
 Starting program: /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-cre
 ate-ns-info-thr
 Process /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-create-ns-inf
 o-thr created; pid = 16266
 Unexpected vCont reply in non-stop mode: T0501:00003fffffffd190;40:00000080560fe290;thread:p3f8a.3f8a;core:0;
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (gdb) break multi-create.c:45
 Breakpoint 2 at 0x10000994: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 45.
 (gdb) commands
 Type commands for breakpoint(s) 2, one per line.

Non-stop tests don't really work with the
--target_board=native-extended-remote board, because tests toggle
non-stop on after GDB is already connected to gdbserver, while
Currently, non-stop must be enabled before connecting.

This adjusts the test to bail if running to main fails, like all other
non-stop tests.

Note non-stop tests do work with --target_board=native-gdbserver.

gdb/testsuite/ChangeLog:
2015-02-21  Pedro Alves  <palves@redhat.com>

	* gdb.threads/multi-create-ns-info-thr.exp: Return early if
	runto_main fails.
2015-02-21 12:03:23 +00:00
Pedro Alves
2db9a4275c GNU/Linux: Stop using libthread_db/td_ta_thr_iter
TL;DR - GDB can hang if something refreshes the thread list out of the
target while the target is running.  GDB hangs inside td_ta_thr_iter.
The fix is to not use that libthread_db function anymore.

Long version:

Running the testsuite against my all-stop-on-top-of-non-stop series is
still exposing latent non-stop bugs.

I was originally seeing this with the multi-create.exp test, back when
we were still using libthread_db thread event breakpoints.  The
all-stop-on-top-of-non-stop series forces a thread list refresh each
time GDB needs to start stepping over a breakpoint (to pause all
threads).  That test hits the thread event breakpoint often, resulting
in a bunch of step-over operations, thus a bunch of thread list
refreshes while some threads in the target are running.

The commit adds a real non-stop mode test that triggers the issue,
based on multi-create.exp, that does an explicit "info threads" when a
breakpoint is hit.  IOW, it does the same things the as-ns series was
doing when testing multi-create.exp.

The bug is a race, so it unfortunately takes several runs for the test
to trigger it.  In fact, even when setting the test running in a loop,
it sometimes takes several minutes for it to trigger for me.

The race is related to libthread_db's td_ta_thr_iter.  This is
libthread_db's entry point for walking the thread list of the
inferior.

Sometimes, when GDB refreshes the thread list from the target,
libthread_db's td_ta_thr_iter can somehow see glibc's thread list as a
cycle, and get stuck in an infinite loop.

The issue is that when a thread exits, its thread control structure in
glibc is moved from a "used" list to a "cache" list.  These lists are
simply circular linked lists where the "next/prev" pointers are
embedded in the thread control structure itself.  The "next" pointer
of the last element of the list points back to the list's sentinel
"head".  There's only one set of "next/prev" pointers for both lists;
thus a thread can only be in one of the lists at a time, not in both
simultaneously.

So when thread C exits, simplifying, the following happens.  A-C are
threads.  stack_used and stack_cache are the list's heads.

Before:

  stack_used -> A -> B -> C -> (&stack_used)
  stack_cache -> (&stack_cache)

After:

  stack_used -> A -> B -> (&stack_used)
  stack_cache -> C -> (&stack_cache)

td_ta_thr_iter starts by iterating at the list's head's next, and
iterates until it sees a thread whose next pointer points to the
list's head again.  Thus in the before case above, C's next points to
stack_used, indicating end of list.  In the same case, the stack_cache
list is empty.

For each thread being iterated, td_ta_thr_iter reads the whole thread
object out of the inferior.  This includes the thread's "next"
pointer.

In the scenario above, it may happen that td_ta_thr_iter is iterating
thread B and has already read B's thread structure just before thread
C exits and its control structure moves to the cached list.

Now, recall that td_ta_thr_iter is running in the context of GDB, and
there's no locking between GDB and the inferior.  From it's local copy
of B, td_ta_thr_iter believes that the next thread after B is thread
C, so it happilly continues iterating to C, a thread that has already
exited, and is now in the stack cache list.

After iterating C, td_ta_thr_iter finds the stack_cache head, which
because it is not stack_used, td_ta_thr_iter assumes it's just another
thread.  After this, unless the reverse race triggers, GDB gets stuck
in td_ta_thr_iter forever walking the stack_cache list, as no thread
in thatlist has a next pointer that points back to stack_used (the
terminating condition).

Before fully understanding the issue, I tried adding cycle detection
to GDB's td_ta_thr_iter callback.  However, td_ta_thr_iter skips
calling the callback in some cases, which means that it's possible
that the callback isn't called at all, making it impossible for GDB to
break the loop.  I did manage to get GDB stuck in that state more than
once.

Fortunately, we can avoid the issue altogether.  We don't really need
td_ta_thr_iter for live debugging nowadays, given PTRACE_EVENT_CLONE.
We already know how to map and lwp id to a thread id without iterating
(thread_from_lwp), so use that more.

gdb/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (linux_handle_extended_wait): Call
	thread_db_notice_clone whenever a new clone LWP is detected.
	(linux_stop_and_wait_all_lwps, linux_unstop_all_lwps): New
	functions.
	* linux-nat.h (thread_db_attach_lwp): Delete declaration.
	(thread_db_notice_clone, linux_stop_and_wait_all_lwps)
	(linux_unstop_all_lwps): Declare.
	* linux-thread-db.c (struct thread_get_info_inout): Delete.
	(thread_get_info_callback): Delete.
	(thread_from_lwp): Use td_thr_get_info and record_thread.
	(thread_db_attach_lwp): Delete.
	(thread_db_notice_clone): New function.
	(try_thread_db_load_1): If /proc is mounted and shows the
	process'es task list, walk over all LWPs and call thread_from_lwp
	instead of relying on td_ta_thr_iter.
	(attach_thread): Don't call check_thread_signals here.  Split the
	tail part of the function (which adds the thread to the core GDB
	thread list) to ...
	(record_thread): ... this function.  Call check_thread_signals
	here.
	(thread_db_wait): Don't call thread_db_find_new_threads_1.  Always
	call thread_from_lwp.
	(thread_db_update_thread_list): Rename to ...
	(thread_db_update_thread_list_org): ... this.
	(thread_db_update_thread_list): New function.
	(thread_db_find_thread_from_tid): Delete.
	(thread_db_get_ada_task_ptid): Simplify.
	* nat/linux-procfs.c: Include <sys/stat.h>.
	(linux_proc_task_list_dir_exists): New function.
	* nat/linux-procfs.h (linux_proc_task_list_dir_exists): Declare.

gdb/gdbserver/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	* thread-db.c: Include "nat/linux-procfs.h".
	(thread_db_init): Skip listing new threads if the kernel supports
	PTRACE_EVENT_CLONE and /proc/PID/task/ is accessible.

gdb/testsuite/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	* gdb.threads/multi-create-ns-info-thr.exp: New file.
2015-02-20 21:40:31 +00:00
Pedro Alves
5c5019c27c PR18006: internal error if threaded program calls clone(CLONE_VM)
On GNU/Linux, if a pthreaded program has a thread call clone(CLONE_VM)
directly, and then that clone LWP hits a debug event (breakpoint,
etc.) GDB internal errors.  Threaded programs shouldn't really be
calling clone directly, but GDB shouldn't crash either.

The crash looks like this:

 (gdb) break clone_fn
 Breakpoint 2 at 0x4007d8: file clone-thread_db.c, line 35.
 (gdb) r
 ...
 [Thread debugging using libthread_db enabled]
 ...
 src/gdb/linux-nat.c:1030: internal-error: lin_lwp_attach_lwp: Assertion `lwpid > 0' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.

The problem is that 'clone' ends up clearing the parent thread's tid
field in glibc's thread data structure.  For x86_64, the glibc code in
question is here:

  sysdeps/unix/sysv/linux/x86_64/clone.S:

   ...
          testq   $CLONE_THREAD, %rdi
          jne     1f
          testq   $CLONE_VM, %rdi
          movl    $-1, %eax            <----
          jne     2f
          movl    $SYS_ify(getpid), %eax
          syscall
  2:      movl    %eax, %fs:PID
          movl    %eax, %fs:TID        <----
  1:

When GDB refreshes the thread list out of libthread_db, it finds a
thread with LWP with pid -1 (the clone's parent), which naturally
isn't yet on the thread list.  GDB then tries to attach to that bogus
LWP id, which is caught by that assertion.

The fix is to detect the bad PID early.

Tested on x86-64 Fedora 20.  GDBserver doesn't need any fix.

gdb/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	PR threads/18006
	* linux-thread-db.c (thread_get_info_callback): Return early if
	the thread's lwp id is -1.

gdb/testsuite/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	PR threads/18006
	* gdb.threads/clone-thread_db.c: New file.
	* gdb.threads/clone-thread_db.exp: New file.
2015-02-20 19:00:21 +00:00
Pedro Alves
0703599a49 Fix adjust_pc_after_break, remove still current thread check
On decr_pc_after_break targets, GDB adjusts the PC incorrectly if a
background single-step stops somewhere where PC-$decr_pc has a
breakpoint, and the thread that finishes the step is not the current
thread, like:

   ADDR1 nop <-- breakpoint here
   ADDR2 jmp PC

IOW, say thread A is stepping ADDR2's line in the background (an
infinite loop), and the user switches focus to thread B.  GDB's
adjust_pc_after_break logic confuses the single-step stop of thread A
for a hit of the breakpoint at ADDR1, and thus adjusts thread A's PC
to point at ADDR1 when it should not, and reports a breakpoint hit,
when thread A did not execute the instruction at ADDR1 at all.

The test added by this patch exercises exactly that.

I can't find any reason we'd need the "thread to be examined is still
the current thread" condition in adjust_pc_after_break, at least
nowadays; it might have made sense in the past.  Best just remove it,
and rely on currently_stepping().

Here's the test's log of a run with an unpatched GDB:

 35        while (1);
 (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next over nop
 next&
 (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next& over inf loop
 thread 1
 [Switching to thread 1 (Thread 0x7ffff7fc2740 (LWP 29027))](running)
 (gdb)
 PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: switch to main thread
 Breakpoint 2, thread_function (arg=0x0) at ...src/gdb/testsuite/gdb.threads/step-bg-decr-pc-switch-thread.c:34
 34        NOP; /* set breakpoint here */
 FAIL: gdb.threads/step-bg-decr-pc-switch-thread.exp: no output while stepping

gdb/ChangeLog:
2015-02-11  Pedro Alves  <pedro@codesourcery.com>

	* infrun.c (adjust_pc_after_break): Don't adjust the PC just
	because the event thread is not the current thread.

gdb/testsuite/ChangeLog:
2015-02-11  Pedro Alves  <pedro@codesourcery.com>

	* gdb.threads/step-bg-decr-pc-switch-thread.c: New file.
	* gdb.threads/step-bg-decr-pc-switch-thread.exp: New file.
2015-02-11 09:45:41 +00:00
Pedro Alves
01b088bc51 Add "signal SIGTRAP" test
Some local changes I was working on related to SIGTRAP handling
resulted in "signal SIGTRAP" no longer passing the SIGTRAP to the
inferior.

Surprisingly, only annota1.exp catches this.  This commit adds a test
that doesn't rely on annotations, so that at the point annotations are
finaly dropped, we still have this use case covered ...

This is a multi-threaded test to also exercise the case of first
needing to do a step-over before delivering the signal.

Tested on x86_64 Fedora 20, native, remote/extended-remote gdbserver.

gdb/testsuite/
2015-02-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/signal-sigtrap.c: New file.
	* gdb.threads/signal-sigtrap.exp: New file.
2015-02-10 19:30:55 +00:00
Pedro Alves
e584fdbc6a Improve gdb.threads/attach-many-short-lived-threads.exp timeout handling
The buildbot shows that this test is still racy, and occasionally
fails with time outs on some machines.  I'd like to get major issues
with load out of the way.

The test currently exits after 180s, which is just a random number,
that has no relation to what the .exp file considers a time out.  This
commit makes the program wait a bit longer than what the .exp file
considers a time out, and, resets the timer for each iteration.

Tested on x86_64 Fedora 20, native and extended-remote gdbserver.

gdb/testsuite/
2015-02-06  Pedro Alves  <palves@redhat.com>

	* gdb.threads/attach-many-short-lived-threads.c (SECONDS): New
	macro.
	(seconds_left, again): New globals.
	(main): Wait seconds_left in a 1-second sleep loop instead of
	sleeping 180 seconds.  If 'again' is set, reset the seconds
	counter.
	* gdb.threads/attach-many-short-lived-threads.exp (test): Set
	'again' in the inferior before detaching.  Print the seconds left.
	(options): New global.
	(top level): Build program with	-DTIMEOUT=$timeout.
2015-02-06 13:24:32 +01:00
Mark Wielaard
37bc665e4e Remove testsuite compile errors with GCC5.
GCC5 defaults to the GNU11 standard for C and warns by default for
implicit function declarations and implicit return types.
https://gcc.gnu.org/gcc-5/porting_to.html

Fixing these issues in the testsuite turns 9 untested and 17 unsupported
testcases into 417 new passes when compiling with GCC5.

gdb/testsuite/ChangeLog:

        * gdb.arch/i386-bp_permanent.c (standard): New declaration.
        * gdb.base/disp-step-fork.c: Include unistd.h.
        * gdb.base/siginfo-obj.c: Include stdio.h.
        * gdb.base/siginfo-thread.c: Likewise.
        * gdb.mi/non-stop.c: Include unistd.h.
        * gdb.mi/nsthrexec.c: Include stdio.h.
        * gdb.mi/pthreads.c: Include unistd.h.
        * gdb.modula2/unbounded1.c (main): Declare returns int.
        * gdb.reverse/consecutive-reverse.c: Likewise.
        * gdb.threads/create-fail.c: Include unistd.h.
        * gdb.threads/killed.c: Likewise.
        * gdb.threads/linux-dp.c: Likewise.
        * gdb.threads/non-ldr-exc-1.c: Include stdio.h and string.h.
        * gdb.threads/non-ldr-exc-2.c: Likewise.
        * gdb.threads/non-ldr-exc-3.c: Likewise.
        * gdb.threads/non-ldr-exc-4.c: Likewise.
        * gdb.threads/pthreads.c: Include unistd.h.
        (main): Declare returns int.
        * gdb.threads/tls-main.c (foo): New declaration.
        * gdb.threads/watchpoint-fork-mt.c: Define _GNU_SOURCE.
2015-01-25 18:50:56 +01:00
Pedro Alves
198297aafb Linux: make target_is_async_p return false when async is off
linux_nat_is_async_p currently always returns true, even when the
target is _not_ async.  That confuses
gdb_readline_wrapper/gdb_readline_wrapper_cleanup, which
force-disables target-async while the secondary prompt is active.  As
a result, when gdb_readline_wrapper returns, the target is left async,
even through it was sync to begin with.

That can result in weird bugs, like the one the test added by this
commit exposes.

Ref: https://sourceware.org/ml/gdb-patches/2015-01/msg00592.html

gdb/ChangeLog:
2015-01-23  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (linux_is_async_p): New macro.
	(linux_nat_is_async_p):
	(linux_nat_terminal_inferior): Check whether the target can async
	instead of whether it is already async.
	(linux_nat_terminal_ours): Don't check whether the target is
	async.
	(linux_async_pipe): Use linux_is_async_p.

gdb/testsuite/ChangeLog:
2015-01-23  Pedro Alves  <palves@redhat.com>

	* gdb.threads/continue-pending-after-query.c: New file.
	* gdb.threads/continue-pending-after-query.exp: New file.
2015-01-23 11:12:39 +00:00
Pedro Alves
ede9f622af add non-stop test that stresses thread starvation issues
This commit adds a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, that exposes the
thread starvation issues fixed by the previous patches.  It sets a set
of threads stepping in parallel, and has one of them get a signal.
Without the previous fixes, this would fail with timeouts.

gdb/testsuite/
2015-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-stop-fair-events.c: New file.
	* gdb.threads/non-stop-fair-events.exp: New file.
2015-01-09 14:44:42 +00:00
Pedro Alves
9665ffdd59 gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB
These three test all spawn a few threads and then send a SIGSTOP to
their parent GDB in order to pause it while the new threads set things
up for the test.  With a GDB patch that changes the inferior thread's
scheduling a bit, I sometimes see:

  FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
  ...
  FAIL: gdb.threads/watchthreads-reorder.exp: reorder1: continue a (timeout)
  ...
  FAIL: gdb.threads/ia64-sigill.exp: continue (timeout)
  ...

The issue is that the test program stops GDB before it had a chance of
processing the new thread's clone event:

  (gdb) PASS: gdb.threads/siginfo-threads.exp: get pid
  continue
  Continuing.
  Stopping GDB PID 21541.
  Waiting till the threads initialize their TIDs.
  FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)

On Linux (at least), new threads start stopped, and the debugger must
resume them.  The fix is to make the test program wait for the new
threads to be running before stopping GDB.

gdb/testsuite/
2015-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/ia64-sigill.c (threads_started_barrier): New global.
	(thread_func): Wait on barrier.
	(main): Wait for all threads to start before stopping GDB.
	* gdb.threads/siginfo-threads.c (threads_started_barrier): New
	global.
	(thread1_func, thread2_func): Wait on barrier.
	(main): Wait for all threads to start before stopping GDB.
	* gdb.threads/watchthreads-reorder.c (threads_started_barrier):
	New global.
	(thread1_func, thread2_func): Wait on barrier.
	(main): Wait for all threads to start before stopping GDB.
2015-01-09 13:58:29 +00:00
Pedro Alves
c945a99f01 Test attaching to a program that constantly spawns short-lived threads
Before the previous fixes, on Linux, this would trigger several
different problems, like:

 [New LWP 27106]
 [New LWP 27047]
 warning: unable to open /proc file '/proc/-1/status'
 [New LWP 27813]
 [New LWP 27869]
 warning: Can't attach LWP 11962: No child processes
 Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: debugger service failed
 warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

gdb/testsuite/
2015-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/attach-many-short-lived-threads.c: New file.
	* gdb.threads/attach-many-short-lived-threads.exp: New file.
2015-01-09 11:44:04 +00:00
Pedro Alves
c1a747c109 Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported
[A test I wrote stumbled on a libthread_db issue related to thread
event breakpoints.  See glibc PR17705:
 [nptl_db: stale thread create/death events if debugger detaches]
 https://sourceware.org/bugzilla/show_bug.cgi?id=17705

This patch avoids that whole issue by making GDB stop using thread
event breakpoints in the first place, which is good for other reasons
as well, anyway.]

Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new
threads in the inferior (to attach to them) or to learn about thread
exit was to coordinate with the inferior's glibc/runtime, using
libthread_db.  That works by putting a breakpoint at a magic address
which is called when a new thread is spawned, or when a thread is
about to exit.  When that breakpoint is hit, all threads are stopped,
and then GDB coordinates with libthread_db to read data structures out
of the inferior to learn about what happened.  Then the breakpoint is
single-stepped, and then all threads are re-resumed.  This isn't very
efficient (stops all threads) and is more fragile (inferior's thread
list in memory may be corrupt; libthread_db bugs, etc.) than ideal.

When the kernel supports PTRACE_EVENT_CLONE (which we already make use
of), there's really no need to use libthread_db's event reporting
mechanism to learn about new LWPs.  And if the kernel supports that,
then we learn about LWP exits through regular WIFEXITED wait statuses,
so no need for the death event breakpoint either.

GDBserver has been likewise skipping the thread_db events for a long
while:
  https://sourceware.org/ml/gdb-patches/2007-10/msg00547.html

There's one user-visible difference: we'll no longer print about
threads being created and exiting while the program is running, like:

 [Thread 0x7ffff7dbb700 (LWP 30670) exited]
 [New Thread 0x7ffff7db3700 (LWP 30671)]
 [Thread 0x7ffff7dd3700 (LWP 30667) exited]
 [New Thread 0x7ffff7dab700 (LWP 30672)]
 [Thread 0x7ffff7db3700 (LWP 30671) exited]
 [Thread 0x7ffff7dcb700 (LWP 30668) exited]

This is exactly the same behavior as when debugging against remote
targets / gdbserver.  I actually think that's a good thing (and as
such have listed this in the local/remote parity wiki page a while
ago), as the printing slows down the inferior.  It's also a
distraction to keep bothering the user about short-lived threads that
she won't be able to interact with anyway.  Instead, the user (and
frontend) will be informed about new threads that currently exist in
the program when the program next stops:

 (gdb) c
 ...
 * ctrl-c *
 [New Thread 0x7ffff7963700 (LWP 7797)]
 [New Thread 0x7ffff796b700 (LWP 7796)]

 Program received signal SIGINT, Interrupt.
 [Switching to Thread 0x7ffff796b700 (LWP 7796)]
 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
 81              testq   %rax,%rax
 (gdb) info threads

A couple of tests had assumptions on GDB thread numbers that no longer
hold.

Tested on x86_64 Fedora 20.

gdb/
2014-01-09  Pedro Alves  <palves@redhat.com>

	Skip enabling event reporting if the kernel supports
	PTRACE_EVENT_CLONE.
	* linux-thread-db.c: Include "nat/linux-ptrace.h".
	(thread_db_use_events): New function.
	(try_thread_db_load_1): Check thread_db_use_events before enabling
	event reporting.
	(update_thread_state): New function.
	(attach_thread): Use it.  Check thread_db_use_events before
	enabling event reporting.
	(thread_db_detach): Check thread_db_use_events before disabling
	event reporting.
	(find_new_threads_callback): Check thread_db_use_events before
	enabling event reporting.  Update the thread's state if not using
	libthread_db events.

gdb/testsuite/
2014-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/fork-thread-pending.exp: Switch to the main thread
	instead of to thread 2.
	* gdb.threads/signal-command-multiple-signals-pending.c (main):
	Add barrier around each pthread_create call instead of around all
	calls.
	* gdb.threads/signal-command-multiple-signals-pending.exp (test):
	Set a break on thread_function and have the child threads hit it
	one at at a time.
2015-01-09 11:42:57 +00:00
Joel Brobecker
32d0add0a6 Update year range in copyright notice of all files owned by the GDB project.
gdb/ChangeLog:

        Update year range in copyright notice of all files.
2015-01-01 13:32:14 +04:00
Pedro Alves
78708b7c8c GDBserver: ctrl-c after leader has exited
The target->request_interrupt callback implements the handling for
ctrl-c.  User types ctrl-c in GDB, GDB sends a \003 to the remote
target, and the remote targets stops the program with a SIGINT, just
like if the user typed ctrl-c in GDBserver's terminal.

The trouble is that using kill_lwp(signal_pid, SIGINT) sends the
SIGINT directly to the program's main thread.  If that thread has
exited already, then that kill won't do anything.

Instead, send the SIGINT to the process group, just like GDB
does (see inf-ptrace.c:inf_ptrace_stop).

gdb.threads/leader-exit.exp is extended to cover the scenario.  It
fails against GDBserver before the patch.

Tested on x86_64 Fedora 20, native and GDBserver.

gdb/gdbserver/
2014-11-12  Pedro Alves  <palves@redhat.com>

	* linux-low.c (linux_request_interrupt): Always send a SIGINT to
	the process group instead of to a specific LWP.

gdb/testsuite/
2014-11-12  Pedro Alves  <palves@redhat.com>

	* gdb.threads/leader-exit.exp: Test sending ctrl-c works after the
	leader has exited.
2014-11-12 11:30:49 +00:00
Pedro Alves
354204061c PR 17408 - assertion failure in switch_back_to_stepped_thread
This PR shows that GDB can easily trigger an assertion here, in
infrun.c:

 5392              /* Did we find the stepping thread?  */
 5393              if (tp->control.step_range_end)
 5394                {
 5395                  /* Yep.  There should only one though.  */
 5396                  gdb_assert (stepping_thread == NULL);
 5397
 5398                  /* The event thread is handled at the top, before we
 5399                     enter this loop.  */
 5400                  gdb_assert (tp != ecs->event_thread);
 5401
 5402                  /* If some thread other than the event thread is
 5403                     stepping, then scheduler locking can't be in effect,
 5404                     otherwise we wouldn't have resumed the current event
 5405                     thread in the first place.  */
 5406                  gdb_assert (!schedlock_applies (currently_stepping (tp)));
 5407
 5408                  stepping_thread = tp;
 5409                }

Like:

 gdb/infrun.c:5406: internal-error: switch_back_to_stepped_thread: Assertion `!schedlock_applies (1)' failed.

The way the assertion is written is assuming that with schedlock=step
we'll always leave threads other than the one with the stepping range
locked, while that's not true with the "next" command.  With schedlock
"step", other threads still run unlocked when "next" detects a
function call and steps over it.  Whether that makes sense or not,
still, it's documented that way in the manual.  If another thread hits
an event that doesn't cause a stop while the nexting thread steps over
a function call, we'll get here and fail the assertion.

The fix is just to adjust the assertion.  Even though we found the
stepping thread, we'll still step-over the breakpoint that just
triggered correctly.

Surprisingly, gdb.threads/schedlock.exp doesn't have any test that
steps over a function call.  This commits fixes that.  This ensures
that "next" doesn't switch focus to another thread, and checks whether
other threads run locked or not, depending on scheduler locking mode
and command.  There's a lot of duplication in that file that this ends
cleaning up.  There's more that could be cleaned up, but that would
end up an unrelated change, best done separately.

This new coverage in schedlock.exp happens to trigger the internal
error in question, like so:

 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (1) (GDB internal error)
 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (3) (GDB internal error)
 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (5) (GDB internal error)
 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (7) (GDB internal error)
 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (9) (GDB internal error)
 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next does not change thread (switched to thread 0)
 FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: current thread advanced - unlocked (wrong amount)

That's because we have more than one thread running the same loop, and
while one thread is stepping over a function call, the other thread
hits the step-resume breakpoint of the first, which needs to be
stepped over, and we end up in switch_back_to_stepped_thread exactly
in the problem case.

I think a simpler and more directed test is also useful, to not rely
on internal breakpoint magics.  So this commit also adds a test that
has a thread trip on a conditional breakpoint that doesn't cause a
user-visible stop while another thread is stepping over a call.  That
currently fails like this:

 FAIL: gdb.threads/next-bp-other-thread.exp: schedlock=step: next over function call (GDB internal error)

Tested on x86_64 Fedora 20.

gdb/
2014-10-29  Pedro Alves  <palves@redhat.com>

	PR gdb/17408
	* infrun.c (switch_back_to_stepped_thread): Use currently_stepping
	instead of assuming a thread with a stepping range is always
	stepping.

gdb/testsuite/
2014-10-29  Pedro Alves  <palves@redhat.com>

	PR gdb/17408
	* gdb.threads/schedlock.c (some_function): New function.
	(call_function): New global.
	(MAYBE_CALL_SOME_FUNCTION): New macro.
	(thread_function): Call it.
	* gdb.threads/schedlock.exp (get_args): Add description parameter,
	and use it instead of a global counter.  Adjust all callers.
	(get_current_thread): Use "find current thread" for test message
	here rather than having all callers pass down the same string.
	(goto_loop): New procedure, factored out from ...
	(my_continue): ... this.
	(step_ten_loops): Change parameter from test message to command to
	use.  Adjust.
	(list_count): Delete global.
	(check_result): New procedure, factored out from duplicate top
	level code.
	(continue tests): Wrap in with_test_prefix.
	(test_step): New procedure, factored out from duplicate top level
	code.
	(top level): Test "step" in combination with all scheduler-locking
	modes.  Test "next" in combination with all scheduler-locking
	modes, and in combination with stepping over a function call or
	not.
	* gdb.threads/next-bp-other-thread.c: New file.
	* gdb.threads/next-bp-other-thread.exp: New file.
2014-10-29 18:15:39 +00:00
Pedro Alves
09dd9a6907 Remove Vax Ultrix and VAX BSD support
Built and tested on x86_64 Fedora 20, with --enable-targets=all.

gdb/
2014-10-24  Pedro Alves  <palves@redhat.com>

	* Makefile.in (ALLDEPFILES): Remove vax-nat.c.
	* NEWS (Removed targets): Add VAX BSD and VAX Ultrix.
	* config/vax/vax.mh: Delete.
	* configure.host: Move vax-*-bsd* and vax-*-ultrix* to the
	obsolete configurations section.
	* configure.tgt (vax-*-*): Don't mention 4.2BSD nor Ultrix.
	* vax-nat.c: Delete file.

gdb/testsuite/
2014-10-24  Pedro Alves  <palves@redhat.com>

	* gdb.base/corefile.exp: Remove references to ultrix.
	* gdb.base/interrupt.exp: Likewise.
	* gdb.base/whatis.exp: Likewise.
	* gdb.gdb/selftest.exp: Likewise.
	* gdb.threads/manythreads.exp: Likewise.
	* gdb.threads/print-threads.exp: Likewise.
	* gdb.threads/pthreads.exp:: Likewise.
	* gdb.threads/schedlock.exp: Likewise.
2014-10-24 17:56:56 +01:00
Pedro Alves
32a8097ba5 Delete Tru64 support
This commit does most of the mechanical removal.  IOW, the easy part.

procfs.c isn't touched beyond removing a couple obvious bits that are
guarded by a couple macros defined in config/alpha/nm-osf3.h.  Going
beyond that for procfs.c & co would be a harder excision that
potentially affects Solaris.

Some comments in the generic alpha code ABIs that may still be
relevant and I wouldn't know what to do with them.  That can always be
done on a separate pass, preferably by someone who can test on alpha.

A couple other spots have references to OSF/Tru64 and related files
being removed, but it felt like removing them would make things worse,
not better.  We can revisit those when we next need to touch that
code.

I didn't remove a reference to osf in testsuite/lib/future.exp, as I
believe that code is imported from DejaGNU.

Built and tested on x86_64 Fedora 20, with --enable-targets=all.

Tested that building for --target=alpha-osf3 on x86_64 Fedora 20
fails with:

 checking for default auto-load directory... $debugdir:$datadir/auto-load
 checking for default auto-load safe-path... $debugdir:$datadir/auto-load
 *** Configuration alpha-unknown-osf3 is obsolete.
 *** Support has been REMOVED.
 make[1]: *** [configure-gdb] Error 1
 make[1]: Leaving directory `build-osf'
 make: *** [all] Error 2

gdb/
2014-10-17  Pedro Alves  <palves@redhat.com>

	* Makefile.in (ALL_64_TARGET_OBS): Remove alpha-osf1-tdep.o.
	(HFILES_NO_SRCDIR): Remove config/alpha/nm-osf3.h.
	(ALLDEPFILES): Remove alpha-nat.c, alpha-osf1-tdep.c and
	solib-osf.c.
	* NEWS: Mention that support for alpha*-*-osf* has been removed.
	* ada-lang.h [__alpha__ && __osf__]
	(ADA_KNOWN_RUNTIME_FILE_NAME_PATTERNS): Delete.
	* alpha-nat.c, alpha-osf1-tdep.c: Delete files.
	* alpha-tdep.c (alpha_gdbarch_init): Remove reference to
	GDB_OSABI_OSF1.
	* config/alpha/alpha-osf3.mh, config/alpha/nm-osf3.h: Delete
	files.
	* config/djgpp/fnchange.lst (config/alpha/alpha-osf1.mh)
	(config/alpha/alpha-osf2.mh, config/alpha/alpha-osf3.mh): Delete.
	* configure: Regenerate.
	* configure.ac: Remove references to osf.
	* configure.host: Handle alpha*-*-osf* in the obsolete hosts
	section.  Remove all other references to osf.
	* configure.tgt: Add alpha*-*-osf* to the obsolete targets section.
	Remove all other references to osf.
	* dec-thread.c: Delete file.
	* defs.h (GDB_OSABI_OSF1): Delete.
	* inferior.h (START_INFERIOR_TRAPS_EXPECTED): New unconditionally
	defined.
	* osabi.c (gdb_osabi_names): Delete "OSF/1".
	* procfs.c (procfs_debug_inferior) [PROCFS_DONT_TRACE_FAULTS]:
	Delete code.
	(unconditionally_kill_inferior)
	[PROCFS_NEED_CLEAR_CURSIG_FOR_KILL]: Delete code.
	* solib-osf.c: Delete file.

gdb/testsuite/
2014-10-17  Pedro Alves  <palves@redhat.com>

	* gdb.base/callfuncs.exp: emove references to osf.
	* gdb.base/sigall.exp: Likewise.
	* gdb.gdb/selftest.exp: Likewise.
	* gdb.hp/gdb.base-hp/callfwmall.exp: Likewise.
	* gdb.mi/non-stop.c: Likewise.
	* gdb.mi/pthreads.c: Likewise.
	* gdb.reverse/sigall-precsave.exp: Likewise.
	* gdb.reverse/sigall-reverse.exp: Likewise.
	* gdb.threads/pthreads.c: Likewise.
	* gdb.threads/pthreads.exp: Likewise.

gdb/doc/
2014-10-17  Pedro Alves  <palves@redhat.com>

	* gdb.texinfo (Ada Tasks and Core Files): Delete mention of Tru64.
	(SVR4 Process Information): Delete mention of OSF/1.
2014-10-17 11:18:59 +01:00
Yao Qi
052ca37073 No longer pull thread list explicitly
As the result of the patch below, GDB updates thread list when a stop is
presented to user.  The tests don't have to fetch thread list explicitly.

  [PATCH 3/3] Fix non-stop regressions caused by "breakpoints always-inserted off" changes
  https://sourceware.org/ml/gdb-patches/2014-09/msg00734.html

This patch is to remove the test code updating thread list.

Run these three tests many times on arm-linux-gnueabi and x86-linux.
No regressions.

gdb/testsuite:

2014-10-11  Yao Qi  <yao@codesourcery.com>

	* gdb.threads/thread-find.exp: Don't execute command
	"info threads".
	* gdb.threads/attach-into-signal.exp (corefunc): Likewise.
	* gdb.threads/linux-dp.exp: Don't check the condition
	$threads_created equals to zero.
2014-10-11 08:32:52 +08:00
Pedro Alves
2278c276a8 gdb.threads/manythreads.exp: clean up and add comment
In git b57bacec, I said:

> With that in place, the need to delay "Program received signal FOO"
> was actually caught by the manythreads.exp test.  Without that bit, I
> was getting:
>
>   [Thread 0x7ffff7f13700 (LWP 4499) exited]
>   [New Thread 0x7ffff7f0b700 (LWP 4500)]
>   ^C
>   Program received signal SIGINT, Interrupt.
>   [New Thread 0x7ffff7f03700 (LWP 4501)]           <<< new output
>   [Switching to Thread 0x7ffff7f0b700 (LWP 4500)]
>   __GI___nptl_death_event () at events.c:31
>   31      {
>   (gdb) FAIL: gdb.threads/manythreads.exp: stop threads 1
>
> That is, I was now getting "New Thread" lines after the "Program
> received signal" line, and the test doesn't expect them.  As the
> number of new threads discovered before and after the "Program
> received signal" output is unbounded, it's much nicer to defer
> "Program received signal" until after synching the thread list, thus
> close to the "switching to thread" output and "current frame/source"
> info:
>
>   [Thread 0x7ffff7863700 (LWP 7647) exited]
>   ^C[New Thread 0x7ffff786b700 (LWP 7648)]
>
>   Program received signal SIGINT, Interrupt.
>   [Switching to Thread 0x7ffff7fc4740 (LWP 6243)]
>   __GI___nptl_create_event () at events.c:25
>   25      {
>   (gdb) PASS: gdb.threads/manythreads.exp: stop threads 1

This commit factors out the two places in the test that are effected
by this, and adds there a destilled version of the comment above.

gdb/testsuite/
2014-10-02  Pedro Alves  <palves@redhat.com>

	* gdb.threads/manythreads.exp (interrupt_and_wait): New procedure.
	(top level) <stop threads 1, stop threads 2>: Use it.
2014-10-02 10:13:56 +01:00
Pedro Alves
b57bacecd5 Fix non-stop regressions caused by "breakpoints always-inserted off" changes
Commit a25a5a45 (Fix "breakpoint always-inserted off"; remove
"breakpoint always-inserted auto") regressed non-stop remote
debugging.

This was exposed by mi-nsintrall.exp intermittently failing with a
spurious SIGTRAP.

The problem is that when debugging with "target remote", new threads
the target has spawned but have never reported a stop aren't visible
to GDB until it explicitly resyncs its thread list with the target's.

For example, in a program like this:

 int
 main (void)
 {
   pthread_t child_thread;
   pthread_create (&child_thread, NULL, child_function, NULL);
   return 0;  <<<< set breakpoint here
 }

If the user sets a breakpoint at the "return" statement, and runs the
program, when that breakpoint hit is reported, GDB is only aware of
the main thread.  So if we base the decision to remove or insert
breakpoints from the target based on whether all the threads we know
about are stopped, we'll miss that child_thread is running, and thus
we'll remove breakpoints from the target, even through they should
still remain inserted, otherwise child_thread will miss them.

The break-while-running.exp test actually should also be exposing this
thread-list-out-of-synch problem.  That test sets a breakpoint while
the main thread is stopped, but other threads are running.  Because
other threads are running, the breakpoint is supposed to be inserted
immediately.  But, unless something forces a refetch of the thread
list, like, e.g., "info threads", GDB won't be aware of the other
threads that had been spawned by the main thread, and so won't insert
new or old breakpoints in the target.  And it turns out that the test
is exactly doing an explicit "info threads", masking out the
problem...  This commit adjust the test to exercise the case of not
issuing "info threads".  The test then fails without the GDB fix.

In the ni-nsintrall.exp case, what happens is that several threads hit
the same breakpoint, and when the first thread reports the stop,
because GDB wasn't aware other threads exist, all threads known to GDB
are found stopped, so GDB removes the breakpoints from the target.
The other threads follow up with SIGTRAPs too for that same
breakpoint, which has already been removed.  For the first few
threads, the moribund breakpoints machinery suppresses the SIGTRAPs,
but after a few events (precisely '3 * thread_count () + 1' at the
time the breakpoint was removed, see update_global_location_list), the
moribund breakpoint machinery is no longer aware of the removed
breakpoint, and the SIGTRAP is reported as a spurious stop.

The fix is naturally then to stop assuming that if no thread in the
list is executing, then the target is fully stopped.  We can't know
that until we fully sync the thread list.  Because updating the thread
list on every stop would be too much RSP traffic, I chose instead to
update it whenever we're about to present a stop to the user.

Actually updating the thread list at that point happens to be an item
I had added to the local/remote parity wiki page a while ago:

  Native GNU/Linux debugging adds new threads to the thread list as
  the program creates them "The [New Thread foo] messages". Remote
  debugging can't do that, and it's arguable whether we shouldn't even
  stop native debugging from doing that, as it hinders inferior
  performance. However, a related issue is that with remote targets
  (and gdbserver), even after the program stops, the user still needs
  to do "info threads" to pull an updated thread list. This, should
  most likely be addressed, so that GDB pulls the list itself, perhaps
  just before presenting a stop to the user.

With that in place, the need to delay "Program received signal FOO"
was actually caught by the manythreads.exp test.  Without that bit, I
was getting:

  [Thread 0x7ffff7f13700 (LWP 4499) exited]
  [New Thread 0x7ffff7f0b700 (LWP 4500)]
  ^C
  Program received signal SIGINT, Interrupt.
  [New Thread 0x7ffff7f03700 (LWP 4501)]           <<< new output
  [Switching to Thread 0x7ffff7f0b700 (LWP 4500)]
  __GI___nptl_death_event () at events.c:31
  31      {
  (gdb) FAIL: gdb.threads/manythreads.exp: stop threads 1

That is, I was now getting "New Thread" lines after the "Program
received signal" line, and the test doesn't expect them.  As the
number of new threads discovered before and after the "Program
received signal" output is unbounded, it's much nicer to defer
"Program received signal" until after synching the thread list, thus
close to the "switching to thread" output and "current frame/source"
info:

  [Thread 0x7ffff7863700 (LWP 7647) exited]
  ^C[New Thread 0x7ffff786b700 (LWP 7648)]

  Program received signal SIGINT, Interrupt.
  [Switching to Thread 0x7ffff7fc4740 (LWP 6243)]
  __GI___nptl_create_event () at events.c:25
  25      {
  (gdb) PASS: gdb.threads/manythreads.exp: stop threads 1

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/
2014-10-02  Pedro Alves  <palves@redhat.com>

	* breakpoint.c (breakpoints_should_be_inserted_now): Use
	threads_are_executing.
	* breakpoint.h (breakpoints_should_be_inserted_now): Add
	describing comment.
	* gdbthread.h (threads_are_executing): Declare.
	(handle_signal_stop) <random signals>: Don't print about the
	signal here if stopping.
	(end_stepping_range): Don't notify observers here.
	(normal_stop): Update the thread list.  If stopped by a random
	signal or a stepping range ended, notify observers.
	* thread.c (threads_executing): New global.
	(init_thread_list): Clear 'threads_executing'.
	(set_executing): Set or clear 'threads_executing'.
	(threads_are_executing): New function.
	(update_threads_executing): New function.
	(update_thread_list): Use it.

gdb/testsuite/
2014-10-02  Pedro Alves  <palves@redhat.com>

	* gdb.threads/break-while-running.exp (test): Add new
	'update_thread_list' argument.  Skip "info threads" if false.
	(top level): Add new 'update_thread_list' axis.
2014-10-02 10:08:00 +01:00
Yao Qi
345bcc73f2 Skip dlopen-libpthread.exp in cross testing
I see the following fails on arm-linux-gnueabi,

result of ldd build-git/arm/gdb/testsuite/gdb.threads/dlopen-libpthread.so is 1
output of ldd build-git/arm/gdb/testsuite/gdb.threads/dlopen-libpthread.so is not a dynamic executable
child process exited abnormally
FAIL: gdb.threads/dlopen-libpthread.exp: ldd dlopen-libpthread.so
FAIL: gdb.threads/dlopen-libpthread.exp: ldd dlopen-libpthread.so output contains libs

the test script invokes ldd (on host) for the target libraries, which
is wrong.  ldd can't be cross because it invokes dynamic linker with
LD_TRACE_LOADED_OBJECTS and gets the dependent libraries.  My first
reaction to this problem is to execute ld.so on the target (like
remote_exec target).  When I start to hack proc build_executable_own_libs,
I find it has assumptions here and there that the native testing is
performed.  Then I check the callers of build_executable_own_libs,
and they are all skipped if isnative is false.  It is reasonable to do
the same in dlopen-libpthread.exp too.

gdb/testsuite:

2014-09-30  Yao Qi  <yao@codesourcery.com>

	* gdb.threads/dlopen-libpthread.exp: Skip it if isnative is
	false.
2014-09-30 11:42:51 +08:00
Pedro Alves
a25a5a45ef Fix "breakpoint always-inserted off"; remove "breakpoint always-inserted auto"
By default, GDB removes all breakpoints from the target when the
target stops and the prompt is given back to the user.  This is useful
in case GDB crashes while the user is interacting, as otherwise,
there's a higher chance breakpoints would be left planted on the
target.

But, as long as any thread is running free, we need to make sure to
keep breakpoints inserted, lest a thread misses a breakpoint.  With
that in mind, in preparation for non-stop mode, we added a "breakpoint
always-inserted on" mode.  This traded off the extra crash protection
for never having threads miss breakpoints, and in addition is more
efficient if there's a ton of breakpoints to remove/insert at each
user command (e.g., at each "step").

When we added non-stop mode, and for a period, we required users to
manually set "always-inserted on" when they enabled non-stop mode, as
otherwise GDB removes all breakpoints from the target as soon as any
thread stops, which means the other threads still running will miss
breakpoints.  The test added by this patch exercises this.

That soon revealed a nuisance, and so later we added an extra
"breakpoint always-inserted auto" mode, that made GDB behave like
"always-inserted on" when non-stop was enabled, and "always-inserted
off" when non-stop was disabled.  "auto" was made the default at the
same time.

In hindsight, this "auto" setting was unnecessary, and not the ideal
solution.  Non-stop mode does depends on breakpoints always-inserted
mode, but only as long as any thread is running.  If no thread is
running, no breakpoint can be missed.  The same is true for all-stop
too.  E.g., if, in all-stop mode, and the user does:

 (gdb) c&
 (gdb) b foo

That breakpoint at "foo" should be inserted immediately, but it
currently isn't -- currently it'll end up inserted only if the target
happens to trip on some event, and is re-resumed, e.g., an internal
breakpoint triggers that doesn't cause a user-visible stop, and so we
end up in keep_going calling insert_breakpoints.  The test added by
this patch also covers this.

IOW, no matter whether in non-stop or all-stop, if the target fully
stops, we can remove breakpoints.  And no matter whether in all-stop
or non-stop, if any thread is running in the target, then we need
breakpoints to be immediately inserted.  And then, if the target has
global breakpoints, we need to keep breakpoints even when the target
is stopped.

So with that in mind, and aiming at reducing all-stop vs non-stop
differences for all-stop-on-stop-of-non-stop, this patch fixes
"breakpoint always-inserted off" to not remove breakpoints from the
target until it fully stops, and then removes the "auto" setting as
unnecessary.  I propose removing it straight away rather than keeping
it as an alias, unless someone complains they have scripts that need
it and that can't adjust.

Tested on x86_64 Fedora 20.

gdb/
2014-09-22  Pedro Alves  <palves@redhat.com>

	* NEWS: Mention merge of "breakpoint always-inserted" modes "off"
	and "auto" merged.
	* breakpoint.c (enum ugll_insert_mode): New enum.
	(always_inserted_mode): Now a plain boolean.
	(show_always_inserted_mode): No longer handle AUTO_BOOLEAN_AUTO.
	(breakpoints_always_inserted_mode): Delete.
	(breakpoints_should_be_inserted_now): New function.
	(insert_breakpoints): Pass UGLL_INSERT to
	update_global_location_list instead of calling
	insert_breakpoint_locations manually.
	(create_solib_event_breakpoint_1): New, factored out from ...
	(create_solib_event_breakpoint): ... this.
	(create_and_insert_solib_event_breakpoint): Use
	create_solib_event_breakpoint_1 instead of calling
	insert_breakpoint_locations manually.
	(update_global_location_list): Change parameter type from boolean
	to enum ugll_insert_mode.  All callers adjusted.  Adjust to use
	breakpoints_should_be_inserted_now and handle UGLL_INSERT.
	(update_global_location_list_nothrow): Change parameter type from
	boolean to enum ugll_insert_mode.
	(_initialize_breakpoint): "breakpoint always-inserted" option is
	now a boolean command.  Update help text.
	* breakpoint.h (breakpoints_always_inserted_mode): Delete declaration.
	(breakpoints_should_be_inserted_now): New declaration.
	* infrun.c (handle_inferior_event) <TARGET_WAITKIND_LOADED>:
	Remove breakpoints_always_inserted_mode check.
	(normal_stop): Adjust to use breakpoints_should_be_inserted_now.
	* remote.c (remote_start_remote): Likewise.

gdb/doc/
2014-09-22  Pedro Alves  <palves@redhat.com>

	* gdb.texinfo (Set Breaks): Document that "set breakpoint
	always-inserted off" is the default mode now.  Delete
	documentation of "set breakpoint always-inserted auto".

gdb/testsuite/
2014-09-22  Pedro Alves  <palves@redhat.com>

	* gdb.threads/break-while-running.exp: New file.
	* gdb.threads/break-while-running.c: New file.
2014-09-22 10:07:04 +01:00
Doug Evans
57cbd724c3 Fix set up of queue-signal.exp test.
The test does a backtrace to see which thread (#2 or #3) is assigned
to which SIGUSR (1 or 2).  If the main thread gets to all_threads_running
before the sigusr threads get to their entry point, then the function
name isn't in the backtrace and the test fails.

Alas this version of the code is within epsilon of what I started with,
and then over-simplified things.
2014-09-14 10:48:38 -07:00
Doug Evans
81219e5358 New command queue-signal.
If I want to change the signalled state of multiple threads
it's a bit cumbersome to do with the "signal" command.
What you really want is a way to set the signal state of the
desired threads and then just do "continue".

This patch adds a new command, queue-signal, to accomplish this.
Basically "signal N" == "queue-signal N" + "continue".
That's not precisely true in that "signal" can be used to inject
any signal, including signals set to "nopass"; whereas "queue-signal"
just queues the signal as if the thread stopped because of it.
"nopass" handling is done when the thread is resumed which
"queue-signal" doesn't do.

One could add extra complexity to allow queue-signal to be used to
deliver "nopass" signals like the "signal" command.  I have no current
need for it so in the interests of incremental complexity, I have
left such support out and just have the code flag an error if one
tries to queue a nopass signal.

gdb/ChangeLog:

	* NEWS: Mention new "queue-signal" command.
	* infcmd.c (queue_signal_command): New function.
	(_initialize_infcmd): Add new queue-signal command.

gdb/doc/ChangeLog:

	* gdb.texinfo (Signaling): Document new queue-signal command.

gdb/testsuite/ChangeLog:

	* gdb.threads/queue-signal.c: New file.
	* gdb.threads/queue-signal.exp: New file.
2014-09-13 21:44:00 -07:00
Pedro Alves
fa43b1d7ca after gdb_run_cmd, gdb_expect -> gdb_test_multiple/gdb_test
See:
  https://sourceware.org/ml/gdb-patches/2014-09/msg00404.html

We have a number of places that do gdb_run_cmd followed by gdb_expect,
when it would be better to use gdb_test_multiple or gdb_test.

This converts all that "grep gdb_run_cmd -A 2 | grep gdb_expect"
found.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/testsuite/
2014-09-12  Pedro Alves  <palves@redhat.com>

	* gdb.arch/gdb1558.exp: Replace uses of gdb_expect after
	gdb_run_cmd with gdb_test_multiple or gdb_test throughout.
	* gdb.arch/i386-size-overlap.exp: Likewise.
	* gdb.arch/i386-size.exp: Likewise.
	* gdb.arch/i386-unwind.exp: Likewise.
	* gdb.base/a2-run.exp: Likewise.
	* gdb.base/break.exp: Likewise.
	* gdb.base/charset.exp: Likewise.
	* gdb.base/chng-syms.exp: Likewise.
	* gdb.base/commands.exp: Likewise.
	* gdb.base/dbx.exp: Likewise.
	* gdb.base/find.exp: Likewise.
	* gdb.base/funcargs.exp: Likewise.
	* gdb.base/jit-simple.exp: Likewise.
	* gdb.base/reread.exp: Likewise.
	* gdb.base/sepdebug.exp: Likewise.
	* gdb.base/step-bt.exp: Likewise.
	* gdb.cp/mb-inline.exp: Likewise.
	* gdb.cp/mb-templates.exp: Likewise.
	* gdb.objc/basicclass.exp: Likewise.
	* gdb.threads/killed.exp: Likewise.
2014-09-12 22:16:31 +01:00
Doug Evans
564b7600f2 gdb.threads/thread-execl.exp: #include <stdio.h>.
gdb/testsuite/ChangeLog:

	* gdb.threads/thread-execl.exp: #include <stdio.h>.
2014-08-25 12:23:50 -07:00
Jan Kratochvil
22fd09ae99 Fix 'gcore' with exited threads
Program received signal SIGABRT, Aborted.
[...]
(gdb) gcore foobar
Couldn't get registers: No such process.
(gdb) info threads
[...]
(gdb) gcore foobar
Saved corefile foobar
(gdb)

gcore tries to access the exited thread:
[Thread 0x7ffff7fce700 (LWP 6895) exited]
ptrace(PTRACE_GETREGS, 6895, 0, 0x7fff18167dd0) = -1 ESRCH (No such process)

Without the TRY_CATCH protection testsuite FAILs for:
	gcore .../gdb/testsuite/gdb.threads/gcore-thread0.test
	Cannot find new threads: debugger service failed
	(gdb) FAIL: gdb.threads/gcore-thread.exp: save a zeroed-threads corefile
	+
	core .../gdb/testsuite/gdb.threads/gcore-thread0.test
	".../gdb/testsuite/gdb.threads/gcore-thread0.test" is not a core dump: File format not recognized
	(gdb) FAIL: gdb.threads/gcore-thread.exp: core0file: re-load generated corefile (bad file format)
Maybe the TRY_CATCH could be more inside update_thread_list().

Similar update_thread_list() call is IMO missing in procfs_make_note_section()
but I do not have where to verify that change.

gdb/ChangeLog
2014-08-21  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* linux-tdep.c (linux_corefile_thread_callback): Ignore THREAD_EXITED.
	(linux_make_corefile_notes): call update_thread_list, protected against
	exceptions.

gdb/testsuite/ChangeLog
2014-08-21  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* gdb.threads/gcore-stale-thread.c: New file.
	* gdb.threads/gcore-stale-thread.exp: New file.
2014-08-21 20:36:20 +02:00
Pedro Alves
a8454a7c5a Remove useless gcore command detection
Checking whether the gcore command is included in the GDB build as
proxy for checking whether core dumping is supported by the target is
useless, as gcore.o has been in COMMON_OBS since git 9b4eba8e:

    2009-10-26  Michael Snyder  <msnyder@vmware.com>
                Hui Zhu  <teawater@gmail.com>

        * Makefile.in (SFILES): Add gcore.c.
        (COMMON_OBS): Add gcore.o.
        * config/alpha/alpha-linux.mh (NATDEPFILES): Delete gcore.o.
        * config/alpha/fbsd.mh (NATDEPFILES): Ditto.
	...

IOW, the command is always included in the build.

Instead, nowadays, tests bail out if actually trying to generate a
core fails with an indication the target doesn't support it.  See
gdb_gcore_cmd and callers.

Tested on x86_64 Fedora 20.

gdb/testsuite/ChangeLog:

	* gdb.base/gcore-buffer-overflow.exp: Remove "help gcore" test.
	* gdb.base/gcore-relro-pie.exp: Likewise.
	* gdb.base/gcore-relro.exp: Likewise.
	* gdb.base/gcore.exp: Likewise.
	* gdb.base/print-symbol-loading.exp: Likewise.
	* gdb.threads/gcore-thread.exp: Likewise.
	* lib/gdb.exp (gdb_gcore_cmd): Don't expect "Undefined command".
2014-08-21 11:36:59 +01:00
Pedro Alves
cc6563d29b gdb.threads/signal-command-handle-nopass.exp: Add comment
Explain why we do "info threads".

gdb/testsuite/
2014-07-30  Pedro Alves  <palves@redhat.com>

	* gdb.threads/signal-command-handle-nopass.exp (test): Add
	comment.
2014-07-30 12:19:30 +01:00
Pedro Alves
705096250d Always pass signals to the right thread
Currently, GDB can pass a signal to the wrong thread in several
different but related scenarios.

E.g., if thread 1 stops for signal SIGFOO, the user switches to thread
2, and then issues "continue", SIGFOO is actually delivered to thread
2, not thread 1.  This obviously messes up programs that use
pthread_kill to send signals to specific threads.

This has been a known issue for a long while.  Back in 2008 when I
made stop_signal be per-thread (2020b7ab), I kept the behavior -- see
code in 'proceed' being removed -- wanting to come back to it later.
The time has finally come now.

The patch fixes this -- on resumption, intercepted signals are always
delivered to the thread that had intercepted them.

Another example: if thread 1 stops for a breakpoint, the user switches
to thread 2, and then issues "signal SIGFOO", SIGFOO is actually
delivered to thread 1, not thread 2, because 'proceed' first switches
to thread 1 to step over its breakpoint...  If the user deletes the
breakpoint before issuing "signal FOO", then the signal is delivered
to thread 2 (the current thread).

"signal SIGFOO" can be used for two things: inject a signal in the
program while the program/thread had stopped for none, bypassing
"handle nopass"; or changing/suppressing a signal the program had
stopped for.  These scenarios are really two faces of the same coin,
and GDB can't really guess what the user is trying to do.  GDB might
have intercepted signals in more than one thread even (see the new
signal-command-multiple-signals-pending.exp test).  At least in the
inject case, it's obviously clear to me that the user means to deliver
the signal to the currently selected thread, so best is to make the
command's behavior consistent and easy to explain.

Then, if the user is trying to suppress/change a signal the program
had stopped for instead of injecting a new signal, but, the user had
changed threads meanwhile, then she will be surprised that with:

  (gdb) continue
  Thread 1 stopped for signal SIGFOO.
  (gdb) thread 2
  (gdb) signal SIGBAR

... GDB actually delivers SIGFOO to thread 1, and SIGBAR to thread 2
(with scheduler-locking off, which is the default, because then
"signal" or any other resumption command resumes all threads).

So the patch makes GDB detect that, and ask for confirmation:

  (gdb) thread 1
  [Switching to thread 1 (Thread 10979)]
  (gdb) signal SIGUSR2
  Note:
    Thread 3 previously stopped with signal SIGUSR2, User defined signal 2.
    Thread 2 previously stopped with signal SIGUSR1, User defined signal 1.
  Continuing thread 1 (the current thread) with specified signal will
  still deliver the signals noted above to their respective threads.
  Continue anyway? (y or n)

All these scenarios are covered by the new tests.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/
2014-07-25  Pedro Alves  <palves@redhat.com>

	* NEWS: Mention signal passing and "signal" command changes.
	* gdbthread.h (struct thread_suspend_state) <stop_signal>: Extend
	comment.
	* breakpoint.c (until_break_command): Adjust clear_proceed_status
	call.
	* infcall.c (run_inferior_call): Adjust clear_proceed_status call.
	* infcmd.c (proceed_thread_callback, continue_1, step_once)
	(jump_command): Adjust clear_proceed_status call.
	(signal_command): Warn if other thread that are resumed have
	signals that will be delivered.  Adjust clear_proceed_status call.
	(until_next_command, finish_command)
	(proceed_after_attach_callback, attach_command_post_wait)
	(attach_command): Adjust clear_proceed_status call.
	* infrun.c (proceed_after_vfork_done): Likewise.
	(proceed_after_attach_callback): Adjust comment.
	(clear_proceed_status_thread): Clear stop_signal if not in pass
	state.
	(clear_proceed_status_callback): Delete.
	(clear_proceed_status): New 'step' parameter.  Only clear the
	proceed status of threads the command being prepared is about to
	resume.
	(proceed): If passed in an explicit signal, override stop_signal
	with it.  Don't pass the last stop signal to the thread we're
	resuming.
	(init_wait_for_inferior): Adjust clear_proceed_status call.
	(switch_back_to_stepped_thread): Clear the signal if it should not
	be passed.
	* infrun.h (clear_proceed_status): New 'step' parameter.
	(user_visible_resume_ptid): Add comment.
	* linux-nat.c (linux_nat_resume_callback): Don't check whether the
	signal is in pass state.
	* remote.c (append_pending_thread_resumptions): Likewise.
	* mi/mi-main.c (proceed_thread): Adjust clear_proceed_status call.

gdb/doc/
2014-07-25  Pedro Alves  <palves@redhat.com>
	    Eli Zaretskii  <eliz@gnu.org>

	* gdb.texinfo (Signaling) <signal command>: Explain what happens
	with multi-threaded programs.

gdb/testsuite/
2014-07-25  Pedro Alves  <palves@redhat.com>

	* gdb.threads/signal-command-handle-nopass.c: New file.
	* gdb.threads/signal-command-handle-nopass.exp: New file.
	* gdb.threads/signal-command-multiple-signals-pending.c: New file.
	* gdb.threads/signal-command-multiple-signals-pending.exp: New file.
	* gdb.threads/signal-delivered-right-thread.c: New file.
	* gdb.threads/signal-delivered-right-thread.exp: New file.
2014-07-25 16:57:31 +01:00
Pedro Alves
e76126e8d1 GDBserver crashes when killing a multi-thread process
Here's an example, with the new test:

 gdbserver :9999 gdb.threads/kill
 gdb gdb.threads/kill
 (gdb) b 52
 Breakpoint 1 at 0x4007f4: file kill.c, line 52.
 Continuing.

 Breakpoint 1, main () at kill.c:52
 52        return 0; /* set break here */
 (gdb) k
 Kill the program being debugged? (y or n) y

 gdbserver :9999 gdb.threads/kill
 Process gdb.base/watch_thread_num created; pid = 9719
 Listening on port 1234
 Remote debugging from host 127.0.0.1
 Killing all inferiors
 Segmentation fault (core dumped)

Backtrace:

 (gdb) bt
 #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
 #1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
 #2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
 #3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
 #4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
 #5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
 #6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
 #7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
 #8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
 #9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
 #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283

The problem is that linux_wait_for_event deletes lwps that have exited
(even those not passed in as lwps of interest), while the lwp/thread
list is being walked on with find_inferior.  find_inferior can handle
the current iterated inferior being deleted, but not others.

When killing lwps, we don't really care about any of the pending
status handling of linux_wait_for_event.  We can just waitpid the lwps
directly, which is also what GDB does (see
linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
while we're walking the list.  They'll be deleted by linux_mourn
afterwards.

This crash triggers several times when running the testsuite against
GDBserver with the native-gdbserver board (target remote), but as GDB
can't distinguish between GDBserver crashing and "kill" being
sucessful, as in both cases the connection is closed (the 'k' packet
doesn't require a reply), and the inferior is gone, that results in no
FAIL.

The patch adds a generic test that catches the issue with
extended-remote mode (and works fine with native testing too).  Here's
how it fails with the native-extended-gdbserver board without the fix:

 (gdb) info threads
   Id   Target Id         Frame
   6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
 (gdb) kill
 Kill the program being debugged? (y or n) y
 Remote connection closed
 ^^^^^^^^^^^^^^^^^^^^^^^^
 (gdb) FAIL: gdb.threads/kill.exp: kill

Extended remote should remain connected after the kill.

gdb/gdbserver/
2014-07-11  Pedro Alves  <palves@redhat.com>

	* linux-low.c (kill_wait_lwp): New function, based on
	kill_one_lwp_callback, but use my_waitpid directly.
	(kill_one_lwp_callback, linux_kill): Use it.

gdb/testsuite/
2014-07-11  Pedro Alves  <palves@redhat.com>

	* gdb.threads/kill.c: New file.
	* gdb.threads/kill.exp: New file.
2014-07-11 11:07:13 +01:00