Running the testsuite with "maint set target-non-stop on" shows:
(gdb) PASS: gdb.base/valgrind-infcall.exp: continue #98 (false warning)
continue
Continuing.
dl_main (phdr=<optimized out>..., auxv=<optimized out>) at rtld.c:2302
2302 LIBC_PROBE (init_complete, 2, LM_ID_BASE, r);
Cannot access memory at address 0x400532
(gdb) PASS: gdb.base/valgrind-infcall.exp: continue #99 (false warning)
p gdb_test_infcall ()
$1 = 1
(gdb) FAIL: gdb.base/valgrind-infcall.exp: p gdb_test_infcall ()
Even though that was a native GNU/Linux test run, this test spawns
Valgrind and connects to it with "target remote". The error above is
actually orthogonal to target-non-stop. The real issue is that that
enables displaced stepping, and displaced stepping doesn't work with
Valgrind, because we can't write to the inferior memory (thus can't
copy the instruction to the scratch pad area).
I'm sure there will be other targets with the same issue, so trying to
identify Valgrind wouldn't be sufficient. The fix is to try setting
up the displaced step anyway. If we get a MEMORY_ERROR, we disable
displaced stepping for that inferior, and fall back to doing an
in-line step-over. If "set displaced-stepping" is "on" (as opposed to
"auto), GDB warns displaced stepping failed ("on" is mainly useful for
the testsuite, not for users).
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* inferior.h (struct inferior) <displaced_stepping_failed>: New
field.
* infrun.c (use_displaced_stepping_now_p): New parameter 'inf'.
Return false if dispaced stepping failed before.
(resume): Pass the current inferior to
use_displaced_stepping_now_p. Wrap displaced_step_prepare in
TRY/CATCH. If we get a MEMORY_ERROR, set the inferior's
displaced_stepping_failed flag, and fall back to an in-line
step-over.
gdb/testsuite/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdb.base/valgrind-disp-step.c: New file.
* gdb.base/valgrind-disp-step.exp: New file.
On a target that is both always in non-stop mode and can do displaced
stepping (such as native x86_64 GNU/Linux, with "maint set
target-non-stop on"), the step-over-trips-on-watchpoint.exp test
sometimes fails like this:
(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: thread 1
set scheduler-locking off
(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off
step
-[Switching to Thread 0x7ffff7fc0700 (LWP 11782)]
-Hardware watchpoint 4: watch_me
-
-Old value = 0
-New value = 1
-child_function (arg=0x0) at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:39
-39 other = 1; /* set thread-specific breakpoint here */
-(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
+wait_threads () at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49
+49 return 1; /* in wait_threads */
+(gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
Note "scheduler-locking" was set off. The problem is that on such
targets, the step-over of thread 2 and the "step" of thread 1 can be
set to run simultaneously (since with displaced stepping the
breakpoint isn't ever removed from the target), and sometimes, the
"step" of thread 1 finishes first, so it'd take another resume to see
the watchpoint trigger. Fix this by replacing the wait_threads
function with a one-line infinite loop that doesn't call any function,
so that the "step" of thread 1 never finishes.
gdb/testsuite/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdb.threads/step-over-lands-on-breakpoint.c (wait_threads):
Delete function.
(main): Add alarm. Run an infinite loop instead of calling
wait_threads.
* gdb.threads/step-over-lands-on-breakpoint.exp (do_test): Change
comment.
* gdb.threads/step-over-trips-on-watchpoint.c (wait_threads):
Delete function.
(main): Add alarm. Run an infinite loop instead of calling
wait_threads.
* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Change
comment.
With "maint set target-non-stop on" we get:
@@ -66,13 +66,16 @@ Continuing.
interrupt
(gdb) PASS: gdb.base/interrupt-noterm.exp: interrupt
-Program received signal SIGINT, Interrupt.
-PASS: gdb.base/interrupt-noterm.exp: inferior received SIGINT
-testcase src/gdb/testsuite/gdb.base/interrupt-noterm.exp completed in 0 seconds
+[process 12119] #1 stopped.
+0x0000003615ebc6d0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
+81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
+FAIL: gdb.base/interrupt-noterm.exp: inferior received SIGINT (timeout)
+testcase src/gdb/testsuite/gdb.base/interrupt-noterm.exp completed in 10 seconds
That is, we get "[$thread] #1 stopped" instead of SIGINT.
The issue is that we don't currently distinguish send
"interrupt/ctrl-c" to target terminal vs "stop/pause" thread well;
both cases go through "target_stop".
And then, the native Linux backend (linux-nat.c) implements
target_stop with SIGSTOP in non-stop mode, and SIGINT in all-stop
mode. Since "maint set target-non-stop on" forces the backend to be
always running in non-stop mode, even though the user-visible behavior
is "set non-stop" is "off", "interrupt" causes a SIGSTOP instead of
the SIGINT the test expects.
Fix this by introducing a target_interrupt method to use in the
"interrupt/ctrl-c" case, so "set non-stop off" can always work the
same irrespective of "maint set target-non-stop on/off". I'm
explictly considering changing the "set non-stop on" behavior as out
of scope here.
Most of the patch is an across-the-board rename of to_stop hook
implementations to to_interrupt. The only targets where something
more than a rename is being done are linux-nat.c and remote.c, which
are the only targets that support async, and thus are the only ones
the core side calls target_stop on.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* darwin-nat.c (darwin_stop): Rename to ...
(darwin_interrupt): ... this.
(_initialize_darwin_inferior): Adjust.
* gnu-nat.c (gnu_stop): Delete.
(gnu_target): Don't install gnu_stop.
* inf-ptrace.c (inf_ptrace_stop): Rename to ...
(inf_ptrace_interrupt): ... this.
(inf_ptrace_target): Adjust.
* infcmd.c (interrupt_target_1): Use target_interrupt instead of
target_stop.
* linux-nat (linux_nat_stop): Rename to ...
(linux_nat_interrupt): ... this.
(linux_nat_stop): Reimplement.
(linux_nat_add_target): Install linux_nat_interrupt.
* nto-procfs.c (nto_interrupt_twice): Rename to ...
(nto_handle_sigint_twice): ... this.
(nto_interrupt): Rename to ...
(nto_handle_sigint): ... this. Call target_interrupt instead of
target_stop.
(procfs_wait): Adjust.
(procfs_stop): Rename to ...
(procfs_interrupt): ... this.
(init_procfs_targets): Adjust.
* procfs.c (procfs_stop): Rename to ...
(procfs_interrupt): ... this.
(procfs_target): Adjust.
* remote-m32r-sdi.c (m32r_stop): Rename to ...
(m32r_interrupt): ... this.
(init_m32r_ops): Adjust.
* remote-sim.c (gdbsim_stop_inferior): Rename to ...
(gdbsim_interrupt_inferior): ... this.
(gdbsim_stop): Rename to ...
(gdbsim_interrupt): ... this.
(gdbsim_cntrl_c): Adjust.
(init_gdbsim_ops): Adjust.
* remote.c (sync_remote_interrupt): Adjust comments.
(remote_stop_as): Rename to ...
(remote_interrupt_as): ... this.
(remote_stop): Adjust comment.
(remote_interrupt): New function.
(init_remote_ops): Install remote_interrupt.
* target.c (target_interrupt): New function.
* target.h (struct target_ops) <to_interrupt>: New field.
(target_interrupt): New declaration.
* windows-nat.c (windows_stop): Rename to ...
(windows_interrupt): ... this.
* target-delegates.c: Regenerate.
With "maint set target-non-stop on" we get:
-PASS: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step
+FAIL: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step
The issue is simply that switch_back_to_stepped_thread is not used in
non-stop mode, thus infrun doesn't output the expected "switching back
to stepped thread" log.
gdb/testsuite/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* signal-while-stepping-over-bp-other-thread.exp: Expect "restart
threads" as alternative to "switching back to stepped thread".
This finally implements user-visible all-stop mode running with the
target_ops backend always in non-stop mode. This is a stepping stone
towards finer-grained control of threads, being able to do interesting
things like thread groups, associating groups with breakpoints, etc.
From the user's perspective, all-stop mode is really just a special
case of being able to stop and resume specific sets of threads, so it
makes sense to do this step first.
With this, even in all-stop, the target is no longer in charge of
stopping all threads before reporting an event to the core -- the core
takes care of it when it sees fit. For example, when "next"- or
"step"-ing, we can avoid stopping and resuming all threads at each
internal single-step, and instead only stop all threads when we're
about to present the stop to the user.
The implementation is almost straight forward, as the heavy lifting
has been done already in previous patches. Basically, we replace
checks for "set non-stop on/off" (the non_stop global), with calls to
a new target_is_non_stop_p function. In a few places, if "set
non-stop off", we stop all threads explicitly, and in a few other
places we resume all threads explicitly, making use of existing
methods that were added for teaching non-stop to step over breakpoints
without displaced stepping.
This adds a new "maint set target-non-stop on/off/auto" knob that
allows both disabling the feature if we find problems, and
force-enable it for development (useful when teaching a target about
this. The default is "auto", which means the feature is enabled if a
new target method says it should be enabled. The patch implements the
method in linux-nat.c, just for illustration, because it still returns
false. We'll need a few follow up fixes before turning it on by
default. This is a separate target method from indicating regular
non-stop support, because e.g., while e.g., native linux-nat.c is
close to regression free with all-stop-non-stop (with following
patches will fixing the remaining regressions), remote.c+gdbserver
will still need more fixing, even though it supports "set non-stop
on".
Tested on x86_64 Fedora 20, native, with and without "set displaced
off", and with and without "maint set target-non-stop on"; and also
against gdbserver.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* NEWS: Mention "maint set/show target-non-stop".
* breakpoint.c (update_global_location_list): Check
target_is_non_stop_p instead of non_stop.
* infcmd.c (attach_command_post_wait, attach_command): Likewise.
* infrun.c (show_can_use_displaced_stepping)
(can_use_displaced_stepping_p, start_step_over_inferior):
Likewise.
(internal_resume_ptid): New function.
(resume): Use it.
(proceed): Check target_is_non_stop_p instead of non_stop. If in
all-stop mode but the target is always in non-stop mode, start all
the other threads that are implicitly resumed too.
(for_each_just_stopped_thread, fetch_inferior_event)
(adjust_pc_after_break, stop_all_threads): Check
target_is_non_stop_p instead of non_stop.
(handle_inferior_event): Likewise. Handle detach-fork in all-stop
with the target always in non-stop mode.
(handle_signal_stop) <random signal>: Check target_is_non_stop_p
instead of non_stop.
(switch_back_to_stepped_thread): Check target_is_non_stop_p
instead of non_stop.
(keep_going_stepped_thread): Use internal_resume_ptid.
(stop_waiting): If in all-stop mode, and the target is in non-stop
mode, stop all threads.
(keep_going_pass): Likewise, when starting a new in-line step-over
sequence.
* linux-nat.c (get_pending_status, select_event_lwp)
(linux_nat_filter_event, linux_nat_wait_1, linux_nat_wait): Check
target_is_non_stop_p instead of non_stop.
(linux_nat_always_non_stop_p): New function.
(linux_nat_stop): Check target_is_non_stop_p instead of non_stop.
(linux_nat_add_target): Install linux_nat_always_non_stop_p.
* target-delegates.c: Regenerate.
* target.c (target_is_non_stop_p): New function.
(target_non_stop_enabled, target_non_stop_enabled_1): New globals.
(maint_set_target_non_stop_command)
(maint_show_target_non_stop_command): New functions.
(_initilize_target): Install "maint set/show target-non-stop"
commands.
* target.h (struct target_ops) <to_always_non_stop_p>: New field.
(target_non_stop_enabled): New declaration.
(target_is_non_stop_p): New declaration.
gdb/doc/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdb.texinfo (Maintenance Commands): Document "maint set/show
target-non-stop".
That is, step past breakpoints by:
- pausing all threads
- removing breakpoint at PC
- single-step
- reinsert breakpoint
- restart threads
similarly to all-stop (with displaced stepping disabled). This allows
non-stop to work on targets/architectures without displaced stepping
support. That is, it makes displaced stepping an optimization instead
of a requirement. For example, in principle, all GNU/Linux ports
support non-stop mode at the target_ops level, but not all
corresponding gdbarch's implement displaced stepping. This should
make non-stop work for all (albeit, not as efficiently). And then
there are scenarios where even if the architecture supports displaced
stepping, we can't use it, because we e.g., don't find a usable
address to use as displaced step scratch pad. It should also fix
stepping past watchpoints on targets that have non-continuable
watchpoints in non-stop mode (e.g., PPC, untested). Running the
instruction out of line in the displaced stepping scratch pad doesn't
help that case, as the copied instruction reads/writes the same
watched memory... We can fix that too by teaching GDB to only remove
the watchpoint from the thread that we want to move past the
watchpoint (currently, removing a watchpoint always removes it from
all threads), but again, that can be considered an optimization; not
all targets would support it.
For those familiar with the gdb and gdbserver Linux target_ops
backends, the implementation should look similar, except it is done on
the core side. When we pause threads, we may find they stop with an
interesting event that should be handled later when the thread is
re-resumed, thus we store such events in the thread object, and mark
the event as pending. We should only consume pending events if the
thread is indeed resumed, thus we add a new "resumed" flag to the
thread object. At a later stage, we might add new target methods to
accelerate some of this, like "pause all threads", with corresponding
RSP packets, but we'd still need a fallback method for remote targets
that don't support such packets, so, again, that can be deferred as
optimization.
My _real_ motivation here is making it possible to reimplement
all-stop mode on top of the target always working on non-stop mode, so
that e.g., we can send RSP packets to a remote target even while the
target is running -- can't do that in the all-stop RSP variant, by
design).
Tested on x86_64 Fedora 20, with and without "set displaced off"
forced. The latter forces the new code paths whenever GDB needs to
step past a breakpoint.
gdb/ChangeLog:
2015-08-07 Pedro Alves <pedro@codesourcery.com>
* breakpoint.c (breakpoints_should_be_inserted_now): If any thread
has a pending status, return true.
* gdbthread.h: Include target/waitstatus.h.
(struct thread_suspend_state) <stop_reason, waitstatus_pending_p,
stop_pc>: New fields.
(struct thread_info) <resumed>: New field.
(set_resumed): Declare.
* infrun.c: Include "event-loop.h".
(infrun_async_inferior_event_token, infrun_is_async): New globals.
(infrun_async): New function.
(clear_step_over_info): Add debug output.
(displaced_step_in_progress_any_inferior): New function.
(displaced_step_fixup): New returns int.
(start_step_over): Handle in-line step-overs too. Assert the
thread is marked resumed.
(resume_cleanups): Clear the thread's resumed flag.
(resume): Set the thread's resumed flag. Return early if the
thread has a pending status. Allow stepping a breakpoint with no
signal.
(proceed): Adjust to check 'resumed' instead of 'executing'.
(clear_proceed_status_thread): If the thread has a pending status,
and that status is a finished step, discard the pending status.
(clear_proceed_status): Don't clear step_over_info here.
(random_pending_event_thread, do_target_wait): New functions.
(prepare_for_detach, wait_for_inferior, fetch_inferior_event): Use
do_target_wait.
(wait_one): New function.
(THREAD_STOPPED_BY): New macro.
(thread_stopped_by_watchpoint, thread_stopped_by_sw_breakpoint)
(thread_stopped_by_hw_breakpoint): New functions.
(switch_to_thread_cleanup, save_waitstatus, stop_all_threads): New
functions.
(handle_inferior_event): Also call set_resumed(false) on all
threads implicitly stopped by the event.
(restart_threads, resumed_thread_with_pending_status): New
functions.
(finish_step_over): If we were doing an in-line step-over before,
and no longer are after trying to start a new step-over, restart
all threads. If we have multiple threads with pending events,
save the current event and go through the event loop again.
(handle_signal_stop): Return early if finish_step_over returns
false.
<random signal>: If we get a signal while stepping over a
breakpoint in-line in non-stop mode, restart all threads. Clear
step_over_info before delivering the signal.
(keep_going_stepped_thread): Use internal_error instead of
gdb_assert. Mark the thread as resumed.
(keep_going_pass_signal): Assert the thread isn't already resumed.
If some other thread is doing an in-line step-over, defer the
resume. If we just started a new in-line step-over, stop all
threads. Don't clear step_over_info.
(infrun_async_inferior_event_handler): New function.
(_initialize_infrun): Create async event handler with
infrun_async_inferior_event_handler as callback.
(infrun_async): New declaration.
* target.c (target_async): New function.
* target.h (target_async): Declare macro and readd as function
declaration.
* target/waitstatus.h (enum target_stop_reason)
<TARGET_STOPPED_BY_SINGLE_STEP>: New value.
* thread.c (new_thread): Clear the new waitstatus field.
(set_resumed): New function.
Just a code refactor, no funcionality change intended.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* infrun.c (keep_going_stepped_thread): New function, factored out
from ...
(switch_back_to_stepped_thread): ... here.
Clarify that currently_stepping works at a higher level than
target_resume.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* infrun.c (currently_stepping): Extend intro comment.
* target.h (target_resume): Extend intro comment.
Several misc cleanups that prepare the tail end of this function, the
part that actually re-resumes the stepped thread.
The most non-obvious would be the currently_stepping change, I guess.
That's because it isn't ever correct to pass step=1 to target_resume
on software single-step targets, and currently_stepping works at a
conceptual higher level, it returns step=true even on software step
targets. It doesn't really matter on hardware step targets, as the
breakpoint will be hit immediately, but it's just wrong on software
step targets. I tested it against my x86 software single-step branch,
and it indeed fixes failed assertions (that catch spurious
PTRACE_SINGLESTEP requests) there.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* infrun.c (switch_back_to_stepped_thread): Use ecs->ptid instead
of inferior_ptid. If the stepped thread vanished, return 0
instead of resuming here. Use reset_ecs. Print the prev_pc and
the current stop_pc in log message. Clear trap_expected if the
thread advanced. Don't pass currently_stepping to
do_target_resume.
The main motivation of this patch is sharing more code between the
proceed (starting the inferior for the first time) and keep_going
(restarting the inferior after handling an event) paths and using the
step_over_chain queue now embedded in the thread_info object for
pending in-line step-overs too (instead of just for displaced
stepping).
So this commit:
- splits out a new keep_going_pass_signal function out of keep_going
that is just like keep_going except for the bits that clear the
signal to pass if the signal is set to "handle nopass".
- makes proceed use keep_going too.
- Makes start_step_over use keep_going_pass_signal instead of lower
level displaced stepping things.
One user visible change: if inserting breakpoints while trying to
proceed fails, we now get:
(gdb) si
Warning:
Could not insert hardware watchpoint 7.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
Command aborted.
(gdb)
while before we only saw warnings with no indication that the command
was cancelled:
(gdb) si
Warning:
Could not insert hardware watchpoint 7.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
(gdb)
Tested on x86_64-linux-gnu, ppc64-linux-gnu and s390-linux-gnu.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdbthread.h (struct thread_info) <prev_pc>: Extend comment.
* infrun.c (struct execution_control_state): Move higher up in the
file.
(reset_ecs): New function.
(start_step_over): Now returns int. Rewrite to use
keep_going_pass_signal instead of manually starting a displaced step.
(resume): Don't call set_running here. If displaced stepping
can't start now, clear trap_expected.
(find_thread_needs_step_over): Delete function.
(proceed): Set up finish_thread_state_cleanup. Call set_running.
If the current thread needs a step over, push it in the step-over
chain. Don't set insert breakpoints nor call resume directly
here. Instead rewrite to use start_step_over and
keep_going_pass_signal.
(finish_step_over): New function.
(handle_signal_stop): Call finish_step_over instead of
start_step_over.
(switch_back_to_stepped_thread): If the event thread needs another
step-over do that first. Use start_step_over.
(keep_going_pass_signal): New function, factored out from ...
(keep_going): ... here.
(_initialize_infrun): Comment moved here.
* thread.c (set_running_thread): New function.
(set_running, finish_thread_state): Use set_running_thread.
In order to teach non-stop mode to do in-line step-overs (pause all
threads, remove breakpoint, single-step, reinsert breakpoint, restart
threads), we'll need to be able to queue in-line step over requests,
much like we queue displaced stepping (out-of-line) requests.
Actually, the queue should be the same -- threads wait for their turn
to step past something (breakpoint, watchpoint), doesn't matter what
technique we end up using when the step over actually starts.
I found that the queue management ends up simpler and more efficient
if embedded in the thread objects themselves. This commit converts
the existing displaced stepping queue to that. Later patches will
make the in-line step-overs code paths use it too.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdbthread.h (struct thread_info) <step_over_prev,
step_over_next>: New fields.
(thread_step_over_chain_enqueue, thread_step_over_chain_remove)
(thread_step_over_chain_next, thread_is_in_step_over_chain): New
declarations.
* infrun.c (struct displaced_step_request): Delete.
(struct displaced_step_inferior_state) <step_request_queue>:
Delete field.
(displaced_step_prepare): Assert that trap_expected is set. Use
thread_step_over_chain_enqueue. Split starting a new displaced
step to ...
(start_step_over): ... this new function.
(resume): Assert the thread isn't waiting for a step over already.
(proceed): Assert the thread isn't waiting for a step over
already.
(infrun_thread_stop_requested): Adjust to remove threads from the
embedded step-over chain.
(handle_inferior_event) <fork/vfork>: Call start_step_over after
displaced_step_fixup.
(handle_signal_stop): Call start_step_over after
displaced_step_fixup.
* infrun.h (step_over_queue_head): New declaration.
* thread.c (step_over_chain_enqueue, step_over_chain_remove)
(thread_step_over_chain_next, thread_is_in_step_over_chain)
(thread_step_over_chain_enqueue)
(thread_step_over_chain_remove): New functions.
(delete_thread_1): Remove thread from the step-over chain.
I noticed that even though keep_going knows to start a step over for a
watchpoint, thread_still_needs_step_over forgets it.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* infrun.c (thread_still_needs_step_over): Rename to ...
(thread_still_needs_step_over_bp): ... this.
(enum step_over_what): New.
(thread_still_needs_step_over): Reimplement.
Even though "target remote" supports target-async, the all-stop
target_wait implementation ignores TARGET_WNOHANG. If the core
happens to poll for events and we've already read the stop reply out
of the serial/socket, remote_wait_as hangs forever instead of
returning an indication that there are no events to process. This
can't happen currently, but later changes will trigger this.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* remote.c (remote_wait_as): If not waiting for a stop reply,
return TARGET_WAITKIND_NO_RESUMED. If TARGET_WNOHANG is
requested, don't block waiting forever.
Prepare to use it in contexts without an ecs handy. Follow up patches
will make use of this.
gdb/ChangeLog:
2015-08-07 Pedro Alves <pedro@codesourcery.com>
* infrun.c (adjust_pc_after_break): Now takes thread_info and
waitstatus pointers instead of an ecs. Adjust.
(handle_inferior_event): Adjust caller.
Letting a "checkpoint" run to exit with "set non-stop on" behaves
differently compared to the default all-stop mode ("set non-stop
off").
Currently, in non-stop mode:
(gdb) start
Temporary breakpoint 1 at 0x40086b: file src/gdb/testsuite/gdb.base/checkpoint.c, line 28.
Starting program: build/gdb/testsuite/gdb.base/checkpoint
Temporary breakpoint 1, main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
28 char *tmp = &linebuf[0];
(gdb) checkpoint
checkpoint 1: fork returned pid 24948.
(gdb) c
Continuing.
Copy complete.
Deleting copy.
[Inferior 1 (process 24944) exited normally]
[Switching to process 24948]
(gdb) info threads
Id Target Id Frame
1 process 24948 "checkpoint" (running)
No selected thread. See `help thread'.
(gdb) c
The program is not being run.
(gdb)
Two issues above:
1. Thread 1 got stuck in "(running)" state (it isn't really running)
2. While checkpoints try to preserve the illusion that the thread is
still the same when the process exits, GDB switched to "No thread
selected." instead of staying with thread 1 selected.
Problem #1 is caused by handle_inferior_event and normal_stop not
considering that when a
TARGET_WAITKIND_SIGNALLED/TARGET_WAITKIND_EXITED event is reported,
and the inferior is mourned, the target may still have execution.
Problem #2 is caused by the make_cleanup_restore_current_thread
cleanup installed by fetch_inferior_event not being able to find the
original thread 1's ptid in the thread list, thus not being able to
restore thread 1 as selected thread. The fix is to make the cleanup
installed by make_cleanup_restore_current_thread aware of thread ptid
changes, by installing a thread_ptid_changed observer that adjusts the
cleanup's data.
After the patch, we get the same in all-stop and non-stop modes:
(gdb) c
Continuing.
Copy complete.
Deleting copy.
[Inferior 1 (process 25109) exited normally]
[Switching to process 25113]
(gdb) info threads
Id Target Id Frame
* 1 process 25113 "checkpoint" main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
(gdb)
Turns out the whole checkpoints.exp file can run in non-stop mode
unmodified. I thought of moving most of the test file's contents to a
procedure that can be called twice, once in non-stop mode and another
in all-stop mode. But then, the test already takes close to 30
seconds to run on my machine, so I thought it'd be nicer to run
all-stop and non-stop mode in parallel. Thus I added a new
checkpoint-ns.exp file that just appends "set non-stop on" to GDBFLAGS
and sources checkpoint.exp.
gdb/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* infrun.c (handle_inferior_event): If we get
TARGET_WAITKIND_SIGNALLED or TARGET_WAITKIND_EXITED in non-stop
mode, mark all threads of the exiting process as not-executing.
(normal_stop): If we get TARGET_WAITKIND_SIGNALLED or
TARGET_WAITKIND_EXITED in non-stop mode, finish all threads of the
exiting process, if inferior_ptid still points at a process.
* thread.c (struct current_thread_cleanup) <next>: New field.
(current_thread_cleanup_chain): New global.
(restore_current_thread_ptid_changed): New function.
(restore_current_thread_cleanup_dtor): Remove the cleanup from the
current_thread_cleanup_chain list.
(make_cleanup_restore_current_thread): Add the cleanup data to the
current_thread_cleanup_chain list.
(_initialize_thread): Install restore_current_thread_ptid_changed
as thread_ptid_changed observer.
gdb/testsuite/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdb.base/checkpoint-ns.exp: New file.
* gdb.base/checkpoint.exp: Pass explicit "checkpoint.c" to
standard_testfile.
On x86-solaris 10, we noticed that starting a program would sometimes
cause the debugger to crash. For instance:
% gdb a
(gdb) break adainit
Breakpoint 1 at 0x8051f03
(gdb) run
Starting program: /[...]/a
[Thread debugging using libthread_db enabled]
zsh: 24398 segmentation fault (core dumped) /[...]/gdb a
The exception occurs in dtrace_process_dof_probe, while trying
to process each probe referenced by a DTRACE_DOF_SECT_TYPE_PROVIDER
DOF section from /lib/libc.so.1. For reference, the ELF section
in that shared library providing the DOF data has the following
characteristics:
Idx Name Size VMA LMA File off Algn
14 .SUNW_dof 0000109d 000b4398 000b4398 000b4398 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
The function dtrace_process_dof gets passed the contents of that
ELF section, which allows it to determine the location of the table
where all DOF sections are described. I dumped the contents of
each DOF section as seen by GDB, and it seemed to be plausible,
because the offset of each DOF section was pretty much equal to
the sum of the offset and size of the previous DOF section. Also,
the offset + sum of the last section corresponds to the size of
the .SUNW_dof section.
Things start to break down when processing one of the DOF sections
that has a type of DTRACE_DOF_SECT_TYPE_PROVIDER. It gets the contents
of this DOF section via:
struct dtrace_dof_provider *provider = (struct dtrace_dof_provider *)
DTRACE_DOF_PTR (dof, DOF_UINT (dof, section->dofs_offset));
Said more simply, the struct dtrace_dof_provider data is at
section->dofs_offset of the entire DOF contents. Given that
the contents of SECTION seemed to make sense, so far so good.
However, what SECTION tells us is that our DOF provider section
is 40 bytes long:
(gdb) print *section
$36 = {dofs_type = 15, dofs_align = 4, dofs_flags = 1,
dofs_entsize = 0, dofs_offset = 3264, dofs_size = 40}
^^^^^^^^^^^^^^
But on the other hand:
(gdb) p sizeof (struct dtrace_dof_provider)
$54 = 44
In other words GDB expected a bigger DOF section and when we try to
fetch the value of the last field of that DOF section (dofpv_prenoffs)...
eoffsets_s = DTRACE_DOF_SECT (dof,
DOF_UINT (dof, provider->dofpv_prenoffs));
... we end up reading data that actually belongs to another DOF
section, and therefore irrelevant. This in turn means that the value
of eofftab gets incorrectly set, since it depends on eoffsets_s:
eofftab = DTRACE_DOF_PTR (dof, DOF_UINT (dof, eoffsets_s->dofs_offset));
This invalid address quickly catches up to us when we pass it to
dtrace_process_dof_probe shortly after, where we crash because
we try to subscript it:
Program received signal SIGSEGV, Segmentation fault.
0x08155bba in dtrace_process_dof_probe ([...]) at [...]/dtrace-probe.c:378
378 = ((uint32_t *) eofftab)[...];
This patch fixes the issue by detecting provider DOF sections
that are smaller than expected, and discarding the DOF data.
gdb/ChangeLog:
* dtrace-probe.c (dtrace_process_dof): Ignore the objfile's DOF
data if a DTRACE_DOF_SECT_TYPE_PROVIDER section is found to be
smaller than expected.
The hidden versioned symbol can only be merged with the versioned
symbol with the same symbol version. _bfd_elf_merge_symbol should
check the symbol version before merging the new hidden versioned
symbol with the existing symbol. _bfd_elf_link_hash_copy_indirect can't
copy any references to the hidden versioned symbol. We need to
bind a symbol locally when linking executable if it is locally defined,
hidden versioned, not referenced by shared library and not exported.
bfd/
PR ld/18720
* elflink.c (_bfd_elf_merge_symbol): Add a parameter to indicate
if the new symbol matches the existing one. The new hidden
versioned symbol matches the existing symbol if they have the
same symbol version. Update the existing symbol only if they
match.
(_bfd_elf_add_default_symbol): Update call to
_bfd_elf_merge_symbol.
(_bfd_elf_link_assign_sym_version): Don't set the hidden field
here.
(elf_link_add_object_symbols): Override a definition only if the
new symbol matches the existing one.
(_bfd_elf_link_hash_copy_indirect): Don't copy any references to
the hidden versioned symbol.
(elf_link_output_extsym): Bind a symbol locally when linking
executable if it is locally defined, hidden versioned, not
referenced by shared library and not exported. Turn on
VERSYM_HIDDEN only if the hidden vesioned symbol is defined
locally.
ld/testsuite/
PR ld/18720
* ld-elf/indirect.exp: Run tests for PR ld/18720.
* ld-elf/pr18720.out: New file.
* ld-elf/pr18720a.c: Likewise.
* ld-elf/pr18720b.c: Likewise.
* ld-elf/pr18720c.c: Likewise.
The get_frame_language feels like it would be more at home in frame.c
rather than in stack.c, while the declaration, that is currently in
language.h can be moved into frame.h to match.
A couple of new includes are added, but otherwise no substantial change
here.
gdb/ChangeLog:
* stack.c (get_frame_language): Moved ...
* frame.c (get_frame_language): ... to here.
* language.h (get_frame_language): Declaration moved to frame.h.
* frame.h: Add language.h include, for language enum.
(get_frame_language): Declaration moved from language.h.
* language.c: Add frame.h include.
* top.c: Add frame.h include.
* symtab.h (struct obj_section): Declare.
(struct cmd_list_element): Declare.
As part of a drive to remove deprecated_safe_get_selected_frame, make
the get_frame_language function take a frame parameter. Given the name
of the function this actually seems to make a lot of sense.
The task of fetching a suitable frame is then passed to the calling
functions. For get_frame_language there are not many callers, these are
updated to get the selected frame in a suitable way.
gdb/ChangeLog:
* language.c (show_language_command): Find selected frame before
asking for the language of that frame.
(set_language_command): Likewise.
* language.h (get_frame_language): Add frame parameter.
* stack.c (get_frame_language): Add frame parameter, assert
parameter is not NULL, update comment and reindent.
* top.c (check_frame_language_change): Pass the selected frame
into get_frame_language.
When using options such as --localize-symbol, --globalize-symbol, etc,
along with the --wildcard option, prefixing a symbol name with '!'
should provide non-matching behaviour, as example the following example
is given in the manual:
--wildcard --weaken-symbol !foo --weaken-symbol fo*
which should weaken all symbols matching the pattern 'fo*', but not the
symbol 'foo'.
However, this currently does not work, the current logic will waken all
symbols matching the pattern 'fo*' AND all symbols that are not 'foo'.
The symbol 'foo' is covered by the first condition, and so is weakened,
while, other symbols, for example 'bar' will match the second condition,
and so be weakened.
This patch adjusts the logic so that a pattern prefixed with '!'
specifically DOES NOT apply the relevant change to any matching symbols,
instead of applying the change to all non-matching symbols. So this:
--weaken-symbol !foo
will ensure that the symbol 'foo' is not weakened, but says nothing
about symbols that are not 'foo'. As a result, a pattern prefixed with
'!' now only makes sense when used alongside a more wide ranging
wildcard pattern.
This change should make the wildcard matching feature more useful, with
no overall loss of functionality. The example given in the manual,
weaken all symbols matching 'fo*' except 'foo' can now be achieved, but
so too can more complex examples, such as weaken all symbols matching
'fo*' except 'foo', 'foa', and 'fob', like this:
--wildcard --weaken-symbol !foo \
--weaken-symbol !foa \
--weaken-symbol !fob \
--weaken-symbol fo*
Under the previous scheme, something as symbols as, weaken all symbols
except 'foo' could have been achieved with this:
--weaken-symbol !foo
however, this will no longer work. To achieve the same result under the
new scheme this is now required:
--weaken-symbol !foo --weaken-symbol *
binutils/ChangeLog:
* objcopy.c (is_specified_symbol_predicate): Don't stop at first
match. Non-match rules set found to FALSE.
binutils/testsuite/ChangeLog:
* binutils-all/objcopy.exp: Run new symbol tests.
(objcopy_test_symbol_manipulation): New function.
* binutils-all/symbols-1.d: New file.
* binutils-all/symbols-2.d: New file.
* binutils-all/symbols-3.d: New file.
* binutils-all/symbols-4.d: New file.
* binutils-all/symbols.s: New file.
Indicate speculatively executed instructions with a leading '?'. We use the
space that is normally used for the PC prefix. In the case where the
instruction at the current PC had been executed speculatively before, the PC
prefix will be partially overwritten resulting in "?> ".
As a side-effect, the /p modifier to omit the PC prefix in the "record
instruction-history" command now uses a 3-space PC prefix " " in order to
have enough space for the speculative execution indication.
gdb/
* btrace.c (btrace_compute_ftrace_bts): Clear insn flags.
(pt_btrace_insn_flags): New.
(ftrace_add_pt): Call pt_btrace_insn_flags.
* btrace.h (btrace_insn_flag): New.
(btrace_insn) <flags>: New.
* record-btrace.c (btrace_insn_history): Print insn prefix.
* NEWS: Announce it.
doc/
* gdb.texinfo (Process Record and Replay): Document prefixing of
speculatively executed instructions in the "record instruction-history"
command.
testsuite/
* gdb.btrace/instruction_history.exp: Update.
* gdb.btrace/tsx.exp: New.
* gdb.btrace/tsx.c: New.
* lib/gdb.exp (skip_tsx_tests, skip_btrace_pt_tests): New.
Intel(R) Processor Trace support requires a recent linux/perf_event.h header.
When GDB is built on an older system, Intel(R) Processor Trace will not be
available and there is no indication in the configure and build log as to
what went wrong.
Check for a compatible linux/perf_event.h at configure-time.
gdb/
* configure.ac: Check for PERF_ATTR_SIZE_VER5 in linux/perf_event.h
* configure: Regenerate.
The buildbot shows that PPC64 and x86_64 builders, both native and
extended-remote gdbserver frequently timeout these tests.
until-precsave.exp times out on my x86_64 occasionally as well.
Inspecting the logs, we see that if we waited some more, the tests
would pass.
Simply bump until-precsave.exp timeouts further, and apply the same
treatment to step-precsave.exp.
gdb/testsuite/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* gdb.reverse/step-precsave.exp: Use with_timeout_factor to
increase timeout.
* gdb.reverse/until-precsave.exp: Bump timeouts.
This test fails with --target_board=native-extended-gdbserver because
it misses the usual "disconnect":
(gdb) target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=30454
Already connected to a remote target. Disconnect? (y or n) n
Still connected.
(gdb) FAIL: gdb.base/valgrind-infcall.exp: target remote for vgdb (got interactive prompt)
gdb/testsuite/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* gdb.base/valgrind-infcall.exp: Issue a "disconnect".
This patch is mostly extracted from Pedro's C++ branch. It adds explicit
casts from integer to enum types, where it is really the intention to do
so. This could be because we are ...
* iterating on enum values (we need to iterate on an equivalent integer)
* converting from a value read from bytes (dwarf attribute, agent
expression opcode) to the equivalent enum
* reading the equivalent integer value from another language (Python/Guile)
An exception to that is the casts in regcache.c. It seems to me like
struct regcache's register_status field could be a pointer to an array of
enum register_status. Doing so would waste a bit of memory (4 bytes
used by the enum vs 1 byte used by the current signed char, for each
register). If we switch to C++11 one day, we can define the underlying
type of an enum type, so we could have the best of both worlds.
gdb/ChangeLog:
* arm-tdep.c (set_fp_model_sfunc): Add cast from integer to enum.
(arm_set_abi): Likewise.
* ax-general.c (ax_print): Likewise.
* c-exp.y (exp : string_exp): Likewise.
* compile/compile-loc2c.c (compute_stack_depth_worker): Likewise.
(do_compile_dwarf_expr_to_c): Likewise.
* cp-name-parser.y (demangler_special : DEMANGLER_SPECIAL start):
Likewise.
* dwarf2expr.c (execute_stack_op): Likewise.
* dwarf2loc.c (dwarf2_compile_expr_to_ax): Likewise.
(disassemble_dwarf_expression): Likewise.
* dwarf2read.c (dwarf2_add_member_fn): Likewise.
(read_array_order): Likewise.
(abbrev_table_read_table): Likewise.
(read_attribute_value): Likewise.
(skip_unknown_opcode): Likewise.
(dwarf_decode_macro_bytes): Likewise.
(dwarf_decode_macros): Likewise.
* eval.c (value_f90_subarray): Likewise.
* guile/scm-param.c (gdbscm_make_parameter): Likewise.
* i386-linux-tdep.c (i386_canonicalize_syscall): Likewise.
* infrun.c (handle_command): Likewise.
* memory-map.c (memory_map_start_memory): Likewise.
* osabi.c (set_osabi): Likewise.
* parse.c (operator_length_standard): Likewise.
* ppc-linux-tdep.c (ppc_canonicalize_syscall): Likewise, and use
single return point.
* python/py-frame.c (gdbpy_frame_stop_reason_string): Likewise.
* python/py-symbol.c (gdbpy_lookup_symbol): Likewise.
(gdbpy_lookup_global_symbol): Likewise.
* record-full.c (record_full_restore): Likewise.
* regcache.c (regcache_register_status): Likewise.
(regcache_raw_read): Likewise.
(regcache_cooked_read): Likewise.
* rs6000-tdep.c (powerpc_set_vector_abi): Likewise.
* symtab.c (initialize_ordinary_address_classes): Likewise.
* target-debug.h (target_debug_print_signals): Likewise.
* utils.c (do_restore_current_language): Likewise.
Fixes another C++ -fpermissive error:
src/gdb/gdbserver/tracepoint.c:4535:21: error: invalid conversion from ‘int’ to ‘eval_result_type’ [-fpermissive]
expr_eval_result = ipa_expr_eval_result;
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* tracepoint.c (expr_eval_result): Now an int.
The regcache used to be hidden inside inferiors.c, but since the
tracepoints support that it's a first class object. This also fixes a
few implicit pointer conversion errors in C++ mode, caused by a few
places missing the explicit cast.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* gdbthread.h (struct regcache): Forward declare.
(struct thread_info) <regcache_data>: Now a struct regcache
pointer.
* inferiors.c (inferior_regcache_data)
(set_inferior_regcache_data): Now work with struct regcache
pointers.
* inferiors.h (struct regcache): Forward declare.
(inferior_regcache_data, set_inferior_regcache_data): Now work
with struct regcache pointers.
* regcache.c (get_thread_regcache, regcache_invalidate_thread)
(free_register_cache_thread): Remove struct regcache pointer
casts.
Running gdb.threads/process-dies-while-handling-bp.exp against
gdbserver sometimes FAILs because GDBserver drops the connection, but
the logs leave no clue on what the reason could be. Running manually
a few times, I saw the same:
$ ./gdbserver/gdbserver --multi :9999 testsuite/gdb.threads/process-dies-while-handling-bp
Process testsuite/gdb.threads/process-dies-while-handling-bp created; pid = 12766
Listening on port 9999
Remote debugging from host 127.0.0.1
Listening on port 9999
Child exited with status 0
Child exited with status 0
What happened is that an exception escaped and gdbserver reopened the
connection, which led to that second "Listening on port 9999" output.
The error was a failure to access registers from a now-dead thread.
The exception probably shouldn't have escaped here, but meanwhile,
this at least makes the issue less mysterious.
Tested on x86_64 Fedora 20.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* server.c (captured_main): On error, print the exception message
to stderr, and if run_once is set, throw a quit.
Found while processing the C++ enum changes. It seems like series
should be of type enum complaint_series, instead of adding a cast.
Redundant and out of date comments are also removed.
gdb/ChangeLog:
* complaints.c (enum complaint_series): Add newlines and remove
out of date comment.
(struct complaints) <series>: Change type to enum
complaint_series and remove out of date comment.
(symfile_complaint_hook): Use equivalent enum value
ISOLATED_MESSAGE instead of 0.
While hacking on the fix for PR threads/18600 (Threads left stopped
after fork+thread spawn), I once saw its test (fork-plus-threads.exp)
FAIL against gdbserver because move_out_of_jump_pad_callback has a
gdb_breakpoint_here call, and the caller isn't making sure the current
thread points to the right thread. In the case I saw, the current
thread pointed to the wrong process, so gdb_breakpoint_here returned
the wrong answer. Unfortunately I didn't save logs. Still, seems
obvious enough and it should fix a potential occasional racy FAIL.
Tested on x86_64 Fedora 20.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* linux-low.c (move_out_of_jump_pad_callback): Temporarily switch
the current thread.
Running gdbserver --debug under Valgrind shows:
==4803== Invalid read of size 4
==4803== at 0x432B62: linux_write_memory (linux-low.c:5320)
==4803== by 0x4143F7: write_inferior_memory (target.c:83)
==4803== by 0x415895: remove_memory_breakpoint (mem-break.c:362)
==4803== by 0x432EF5: linux_remove_point (linux-low.c:5460)
==4803== by 0x416319: delete_raw_breakpoint (mem-break.c:802)
==4803== by 0x4163F3: release_breakpoint (mem-break.c:842)
==4803== by 0x416477: delete_breakpoint_1 (mem-break.c:869)
==4803== by 0x4164EF: delete_breakpoint (mem-break.c:891)
==4803== by 0x416843: delete_gdb_breakpoint_1 (mem-break.c:1069)
==4803== by 0x4168D8: delete_gdb_breakpoint (mem-break.c:1098)
==4803== by 0x4134E3: process_serial_event (server.c:4051)
==4803== by 0x4138E4: handle_serial_event (server.c:4196)
==4803== Address 0x4c6b930 is 0 bytes inside a block of size 1 alloc'd
==4803== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4803== by 0x4240C6: xmalloc (common-utils.c:43)
==4803== by 0x41439C: write_inferior_memory (target.c:80)
==4803== by 0x415895: remove_memory_breakpoint (mem-break.c:362)
==4803== by 0x432EF5: linux_remove_point (linux-low.c:5460)
==4803== by 0x416319: delete_raw_breakpoint (mem-break.c:802)
==4803== by 0x4163F3: release_breakpoint (mem-break.c:842)
==4803== by 0x416477: delete_breakpoint_1 (mem-break.c:869)
==4803== by 0x4164EF: delete_breakpoint (mem-break.c:891)
==4803== by 0x416843: delete_gdb_breakpoint_1 (mem-break.c:1069)
==4803== by 0x4168D8: delete_gdb_breakpoint (mem-break.c:1098)
==4803== by 0x4134E3: process_serial_event (server.c:4051)
==4803==
And:
==7272== Conditional jump or move depends on uninitialised value(s)
==7272== at 0x3615E48361: vfprintf (vfprintf.c:1634)
==7272== by 0x414E89: debug_vprintf (debug.c:60)
==7272== by 0x42800A: debug_printf (common-debug.c:35)
==7272== by 0x43937B: my_waitpid (linux-waitpid.c:149)
==7272== by 0x42D740: linux_wait_for_event_filtered (linux-low.c:2441)
==7272== by 0x42DADA: linux_wait_for_event (linux-low.c:2552)
==7272== by 0x42E165: linux_wait_1 (linux-low.c:2860)
==7272== by 0x42F5D8: linux_wait (linux-low.c:3453)
==7272== by 0x4144A4: mywait (target.c:107)
==7272== by 0x413969: handle_target_event (server.c:4214)
==7272== by 0x41A1A6: handle_file_event (event-loop.c:429)
==7272== by 0x41996D: process_event (event-loop.c:184)
gdb/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* nat/linux-waitpid.c (my_waitpid): Only print *status if waitpid
returned > 0.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* linux-low.c (linux_write_memory): Rewrite debug output to avoid
reading beyond the passed in buffer length.
This adds a kfailed test that has the whole process exit just while
several threads continuously step over a breakpoint. Usually, the
process exits just while GDB or GDBserver is handling the breakpoint
hit. In other words, the process disappears while the event thread is
(ptrace-) stopped. This exposes several issues in GDB and GDBserver.
Errors, crashes, etc.
I fixed some of these issues recently, but there's a lot more to do.
It's a bit like playing whack-a-mole at the moment. You fix an issue,
which then exposes several others.
E.g., with the native target, you get (among other errors):
(...)
[New Thread 0x7ffff47b9700 (LWP 18077)]
[New Thread 0x7ffff3fb8700 (LWP 18078)]
[New Thread 0x7ffff37b7700 (LWP 18079)]
Cannot find user-level thread for LWP 18076: generic error
(gdb) KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb/testsuite/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
PR gdb/18749
* gdb.threads/process-dies-while-handling-bp.c: New file.
* gdb.threads/process-dies-while-handling-bp.exp: New file.
This field was never set nor used. This patch removes it.
gdb/ChangeLog:
* common/agent.c (symbol_list) <required>: Remove.
gdb/gdbserver/ChangeLog:
* tracepoint.c (symbol_list) <required>: Remove.
Ref: https://sourceware.org/ml/gdb-patches/2015-07/msg00868.html
This adds a test that has a multithreaded program have several threads
continuously fork, while another thread continuously steps over a
breakpoint.
This exposes several intertwined issues, which this patch addresses:
- When we're stopping and suspending threads, some thread may fork,
and we missed setting its suspend count to 1, like we do when a new
clone/thread is detected. When we next unsuspend threads, the fork
child's suspend count goes below 0, which is bogus and fails an
assertion.
- If a step-over is cancelled because a signal arrives, but then gdb
is not interested in the signal, we pass the signal straight back
to the inferior. However, we miss that we need to re-increment the
suspend counts of all other threads that had been paused for the
step-over. As a result, other threads indefinitely end up stuck
stopped.
- If a detach request comes in just while gdbserver is handling a
step-over (in the test at hand, this is GDB detaching the fork
child), gdbserver internal errors in stabilize_thread's helpers,
which assert that all thread's suspend counts are 0 (otherwise we
wouldn't be able to move threads out of the jump pads). The
suspend counts aren't 0 while a step-over is in progress, because
all threads but the one stepping past the breakpoint must remain
paused until the step-over finishes and the breakpoint can be
reinserted.
- Occasionally, we see "BAD - reinserting but not stepping." being
output (from within linux_resume_one_lwp_throw). That was because
GDB pokes memory while gdbserver is busy with a step-over, and that
suspends threads, and then re-resumes them with proceed_one_lwp,
which missed another reason to tell linux_resume_one_lwp that the
thread should be set back to stepping.
- In a couple places, we were resuming threads that are meant to be
suspended. E.g., when a vCont;c/s request for thread B comes in
just while gdbserver is stepping thread A past a breakpoint. The
resume for thread B must be deferred until the step-over finishes.
- The test runs with both "set detach-on-fork" on and off. When off,
it exercises the case of GDB detaching the fork child explicitly.
When on, it exercises the case of gdb resuming the child
explicitly. In the "off" case, gdb seems to exponentially become
slower as new inferiors are created. This is _very_ noticeable as
with only 100 inferiors gdb is crawling already, which makes the
test take quite a bit to run. For that reason, I've disabled the
"off" variant for now.
gdb/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* target/waitstatus.h (enum target_stop_reason)
<TARGET_STOPPED_BY_SINGLE_STEP>: New value.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* linux-low.c (handle_extended_wait): Set the fork child's suspend
count if stopping and suspending threads.
(check_stopped_by_breakpoint): If stopped by trace, set the LWP's
stop reason to TARGET_STOPPED_BY_SINGLE_STEP.
(linux_detach): Complete an ongoing step-over.
(lwp_suspended_inc, lwp_suspended_decr): New functions. Use
throughout.
(resume_stopped_resumed_lwps): Don't resume a suspended thread.
(linux_wait_1): If passing a signal to the inferior after
finishing a step-over, unsuspend and re-resume all lwps. If we
see a single-step event but the thread should be continuing, don't
pass the trap to gdb.
(stuck_in_jump_pad_callback, move_out_of_jump_pad_callback): Use
internal_error instead of gdb_assert.
(enqueue_pending_signal): New function.
(check_ptrace_stopped_lwp_gone): Add debug output.
(start_step_over): Use internal_error instead of gdb_assert.
(complete_ongoing_step_over): New function.
(linux_resume_one_thread): Don't resume a suspended thread.
(proceed_one_lwp): If the LWP is stepping over a breakpoint, reset
it stepping.
gdb/testsuite/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* gdb.threads/forking-threads-plus-breakpoint.exp: New file.
* gdb.threads/forking-threads-plus-breakpoint.c: New file.
The tail end of linux_wait_1 isn't expecting that the select_event_lwp
machinery can pick a whole-process exit event to report to GDB. When
that happens, both gdb and gdbserver end up quite confused:
...
(gdb)
[Thread 24971.24971] #1 stopped.
0x0000003615a011f0 in ?? ()
c&
Continuing.
(gdb) [New Thread 24971.24981]
[New Thread 24983.24983]
[New Thread 24971.24982]
[Thread 24983.24983] #3 stopped.
0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130
130 pid = ARCH_FORK ();
[New Thread 24984.24984]
Error in re-setting breakpoint -16: PC register is not available
Error in re-setting breakpoint -17: PC register is not available
Error in re-setting breakpoint -18: PC register is not available
Error in re-setting breakpoint -19: PC register is not available
Error in re-setting breakpoint -24: PC register is not available
Error in re-setting breakpoint -25: PC register is not available
Error in re-setting breakpoint -26: PC register is not available
Error in re-setting breakpoint -27: PC register is not available
Error in re-setting breakpoint -28: PC register is not available
Error in re-setting breakpoint -29: PC register is not available
Error in re-setting breakpoint -30: PC register is not available
PC register is not available
(gdb)
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* linux-low.c (add_lwp): Set waitstatus to TARGET_WAITKIND_IGNORE.
(linux_thread_alive): Use lwp_is_marked_dead.
(extended_event_reported): Delete.
(linux_wait_1): Check if waitstatus is TARGET_WAITKIND_IGNORE
instead of extended_event_reported.
(mark_lwp_dead): Don't set the 'dead' flag. Store the waitstatus
as well.
(lwp_is_marked_dead): New function.
(lwp_running): Use lwp_is_marked_dead.
* linux-low.h: Delete 'dead' field, and update 'waitstatus's
comment.
The "extended event with waitstatus" debug output is unreachable, as
it is guarded by "if (!report_to_gdb)". If extended_event_reported is
true, then so is report_to_gdb. Move it to where we print why we're
reporting an event to GDB.
Also, the debug output currently tries to print the wrong struct
target_waitstatus.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* linux-low.c (linux_wait_1): Move fork event output out of the
!report_to_gdb check. Pass event_child->waitstatus to
target_waitstatus_to_string instead of ourstatus.
Reverts a2c59f28 and e474ab13. Since the unary form of ALIGN only
references "dot" implicitly, there isn't really a strong argument for
making ALIGN use a relative value when inside an output section.
* ldexp.c (align_dot_val): Delete.
(fold_unary <ALIGN_K, NEXT>): Revert 2015-07-10 change.
(is_align_conditional): Revert 2015-07-20 change.
(exp_fold_tree_1): Likewise, but keep expanded comment.
* scripttempl/elf.sc (.ldata, .bss): Revert 2015-07-20 change.
* ld.texinfo (<ALIGN>): Correct description.
At https://sourceware.org/ml/gdb-patches/2015-08/msg00097.html, Joel
observed that trying to next/step a program on GNU/Linux sometimes
results in the following failed assertion:
% gdb -q .obj/gprof/main
(gdb) start
(gdb) n
(gdb) step
[...]/infrun.c:2391: internal-error:
resume: Assertion `sig != GDB_SIGNAL_0' failed.
What happened is that, during the "next" operation, GDB hit a
longjmp/exception/step-resume breakpoint but failed to see that this
breakpoint was set for a different thread than the one being stepped.
Joel's detailed analysis follows:
More precisely, at the end of the "start" command, we are stopped at
the start of function Main in main.adb; there are 4 threads in total,
and we are in the main thread (which is thread 1):
(gdb) info thread
Id Target Id Frame
4 Thread 0xb7a56ba0 (LWP 28379) 0xffffe410 in __kernel_vsyscall ()
3 Thread 0xb7c5aba0 (LWP 28378) 0xffffe410 in __kernel_vsyscall ()
2 Thread 0xb7e5eba0 (LWP 28377) 0xffffe410 in __kernel_vsyscall ()
* 1 Thread 0xb7ea18c0 (LWP 28370) main () at /[...]/main.adb:57
All the logs below reference Thread ID/LWP, but it'll be easier to
talk about the threads by GDB thread number. For instance, thread 1
is LWP 28370 while thread 3 is LWP 28378. So, the explanations below
translate the LWPs into thread numbers.
Back to what happens while we are trying to "next' our program:
(gdb) n
infrun: clear_proceed_status_thread (Thread 0xb7a56ba0 (LWP 28379))
infrun: clear_proceed_status_thread (Thread 0xb7c5aba0 (LWP 28378))
infrun: clear_proceed_status_thread (Thread 0xb7e5eba0 (LWP 28377))
infrun: clear_proceed_status_thread (Thread 0xb7ea18c0 (LWP 28370))
infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x805451e
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28370.0 [Thread 0xb7ea18c0 (LWP 28370)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x8054523
We've resumed thread 1 (LWP 28370), and received in return a signal
that the same thread stopped slightly further. It's still in the
range of instructions for the line of source we started the "next"
from, as evidenced by the following trace...
infrun: stepping inside range [0x805451e-0x8054531]
... and thus, we decide to continue stepping the same thread:
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523
infrun: prepare_to_wait
That's when we get an event from a different thread (thread 3)...
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x80782d0
infrun: context switch
infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)
... which we find to be at the address where we set a breakpoint on
"the unwinder debug hook" (namely "_Unwind_DebugHook"). But GDB fails
to notice that the breakpoint was inserted for thread 1 only, and so
decides to handle it as...
infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME
... and inserts a breakpoint at the corresponding resume address, as
evidenced by this the next log:
infrun: exception resume at 80542a2
That breakpoint seems innocent right now, but will play a role fairly
quickly. But for now, GDB has inserted the exception-resume
breakpoint, and needs to single-step thread 3 past the breakpoint it
just hit. Thus, it temporarily disables the exception breakpoint, and
requests a step of that thread:
infrun: skipping breakpoint: stepping past insn at: 0x80782d0
infrun: skipping breakpoint: stepping past insn at: 0x80782d0
infrun: skipping breakpoint: stepping past insn at: 0x80782d0
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0xb7c5aba0 (LWP 28378)] at 0x80782d0
infrun: prepare_to_wait
We then get a notification, still from thread 3, that it's now past
that breakpoint...
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x8078424
... so we can resume what we were doing before, which is single-stepping
thread 1 until we get to a new line of code:
infrun: switching back to stepped thread
infrun: Switching context from Thread 0xb7c5aba0 (LWP 28378) to Thread 0xb7ea18c0 (LWP 28370)
infrun: expected thread still hasn't advanced
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523
The "resume" log above shows that we're resuming thread 1 from where
we left off (0x8054523). We get one more stop at 0x8054529, which is
still inside our stepping range so we go again. That's when we get
the following event, from thread 3:
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x80542a2
Now the stop_pc address is interesting, because it's the address of
"exception resume" breakpoint...
infrun: context switch
infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)
infrun: BPSTAT_WHAT_CLEAR_LONGJMP_RESUME
... and since that location is at a different line of code, this is
where it decides the "next" operation should stop:
infrun: stop_waiting
[Switching to Thread 0xb7c5aba0 (LWP 28378)]
0x080542a2 in inte_tache_rt.ttache_rt (
<_task>=0x80968ec <inte_tache_rt_inst.tache2>)
at /[...]/inte_tache_rt.adb:54
54 end loop;
However, what GDB should have noticed earlier that the exception
breakpoint we hit was for a different thread, thus should have
single-stepped that thread out of the breakpoint _without_ inserting
the exception-return breakpoint, and then resumed the single-stepping
of the initial thread (thread 1) until that thread stepped out of its
stepping range.
This is what this patch does, and after applying it, GDB now correctly
stops on the next line of code.
The patch adds a C++ test that exercises this, both for setjmp/longjmp
and exception breakpoints. With an unpatched GDB it shows:
(gdb) next
[Switching to Thread 22445.22455]
thread_try_catch (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/next-other-thr-longjmp.c:59
59 catch (...)
(gdb) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 1
next
/home/pedro/gdb/mygit/build/../src/gdb/infrun.c:4865: internal-error: process_event_stop_test: Assertion `ecs->event_thread->control.exception_resume_breakpoint != NULL' fa
iled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 2 (GDB internal error)
Resyncing due to internal error.
n
Tested on x86_64-linux, no regressions.
gdb/ChangeLog:
2015-08-05 Pedro Alves <palves@redhat.com>
Joel Brobecker <brobecker@adacore.com>
* breakpoint.c (bpstat_what) <bp_longjmp, bp_longjmp_call_dummy>
<bp_exception, bp_longjmp_resume, bp_exception_resume>: Handle the
case where BS->STOP is not set.
gdb/testsuite/ChangeLog:
2015-08-05 Pedro Alves <palves@redhat.com>
* gdb.threads/next-while-other-thread-longjmps.c: New file.
* gdb.threads/next-while-other-thread-longjmps.exp: New file.
bfd * elf.c (_bfd_elf_copy_private_bfd_data): Copy the sh_link and
sh_info fields of sections whose type has been changed to
SHT_NOBITS.
bin * doc/binutils.texi: Document that the --only-keep-debug option
to strip and objcopy preserves the section headers of stripped
sections.
tests * binutils-all/objcopy.exp (keep_debug_symbols_and_check_links):
New proc. Checks that debug-info-only binaries retain the
sh_link field in stripped sections.
Fixes a build error due to typedef redefinition with some compilers.
Also added missing copyright header.
gdb/
* nat/gdb_thread_db.h: Add copyright header.
Protect against multiple inclusion.