TL;DR - if we step an instruction that is as long as
decr_pc_after_break (1-byte on x86) right after removing the
breakpoint at PC, in non-stop mode, adjust_pc_after_break adjusts the
PC, but it shouldn't.
In non-stop mode, when a breakpoint is removed, it is moved to the
"moribund locations" list. This is because other threads that are
running may have tripped on that breakpoint as well, and we haven't
heard about it. When a trap is reported, we check if perhaps it was
such a deleted breakpoint that caused the trap. If so, we also need
to adjust the PC (decr_pc_after_break).
Now, say that, on x86:
- a breakpoint was placed at an address where we have an instruction
of the same length as decr_pc_after_break on this arch (1 on x86).
- the breakpoint is removed, and thus put on the moribund locations
list.
- the thread is single-stepped.
As there's no breakpoint inserted at PC anymore, the single-step
actually executes the 1-byte instruction normally. GDB should _not_
adjust the PC for the resulting SIGTRAP. But, adjust_pc_after_break
confuses the step SIGTRAP reported for this single-step as being a
SIGTRAP for the moribund location of the breakpoint that used to be at
the previous PC, and so infrun applies the decr_pc_after_break
adjustment incorrectly.
The confusion comes from the special case mentioned in the comment:
static void
adjust_pc_after_break (struct execution_control_state *ecs)
{
...
As a special case, we could have hardware single-stepped a
software breakpoint. In this case (prev_pc == breakpoint_pc),
we also need to back up to the breakpoint address. */
if (thread_has_single_step_breakpoints_set (ecs->event_thread)
|| !ptid_equal (ecs->ptid, inferior_ptid)
|| !currently_stepping (ecs->event_thread)
|| (ecs->event_thread->stepped_breakpoint
&& ecs->event_thread->prev_pc == breakpoint_pc))
regcache_write_pc (regcache, breakpoint_pc);
The condition that incorrectly triggers is the
"ecs->event_thread->prev_pc == breakpoint_pc" one.
Afterwards, the next resume resume re-executes an instruction that had
already executed, which if you're lucky, results in the inferior
crashing. If you're unlucky, you'll get silent bad behavior...
The fix is to remember that we stepped a breakpoint. Turns out the
only case we step a breakpoint instruction today isn't covered by the
testsuite. It's the case of a 'handle nostop" signal arriving while a
step is in progress _and_ we have a software watchpoint, which forces
always single-stepping. This commit extends sigstep.exp to cover
that, and adds a new test for the adjust_pc_after_break issue.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/
2014-10-28 Pedro Alves <palves@redhat.com>
PR gdb/12623
* gdbthread.h (struct thread_info) <stepped_breakpoint>: New
field.
* infrun.c (resume) <stepping breakpoint instruction>: Set the
thread's stepped_breakpoint field. Skip if reverse debugging.
Add comment.
(init_thread_stepping_state, handle_signal_stop): Clear the
thread's stepped_breakpoint field.
gdb/testsuite/
2014-10-28 Pedro Alves <palves@redhat.com>
PR gdb/12623
* gdb.base/sigstep.c (no_handler): New global.
(main): If 'no_handler is true, set the signal handlers to
SIG_IGN.
* gdb.base/sigstep.exp (breakpoint_over_handler): Add
with_sw_watch and no_handler parameters. Handle them.
(top level) <stepping over handler when stopped at a breakpoint
test>: Add a test axis for testing with a software watchpoint, and
another for testing with the signal handler set to SIG_IGN.
* gdb.base/step-sw-breakpoint-adjust-pc.c: New file.
* gdb.base/step-sw-breakpoint-adjust-pc.exp: New file.
I noticed that when I single-step into a signal handler with a
pending/queued signal, the following single-steps while the program is
in the signal handler leave $eflags.TF set. That means subsequent
continues will trap after one instruction, resulting in a spurious
SIGTRAP being reported to the user.
This is a kernel bug; I've reported it to kernel devs (turned out to
be a known bug). I'm seeing it on x86_64 Fedora 20 (Linux
3.16.4-200.fc20.x86_64), and I was told it's still not fixed upstream.
This commit extends gdb.base/sigstep.exp to cover this use case,
xfailed.
Here's what the bug looks like:
(gdb) start
Temporary breakpoint 1, main () at si-handler.c:48
48 setup ();
(gdb) next
50 global = 0; /* set break here */
Let's queue a signal, so we can step into the handler:
(gdb) handle SIGUSR1
Signal Stop Print Pass to program Description
SIGUSR1 Yes Yes Yes User defined signal 1
(gdb) queue-signal SIGUSR1
TF is not set:
(gdb) display $eflags
1: $eflags = [ PF ZF IF ]
Now step into the handler -- "si" does PTRACE_SINGLESTEP+SIGUSR1:
(gdb) si
sigusr1_handler (sig=0) at si-handler.c:31
31 {
1: $eflags = [ PF ZF IF ]
No TF yet. But another single-step...
(gdb) si
0x0000000000400621 31 {
1: $eflags = [ PF ZF TF IF ]
... ends up with TF left set. This results in PTRACE_CONTINUE
trapping after each instruction is executed:
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000400624 in sigusr1_handler (sig=0) at si-handler.c:31
31 {
1: $eflags = [ PF ZF TF IF ]
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
sigusr1_handler (sig=10) at si-handler.c:32
32 global = 0;
1: $eflags = [ PF ZF TF IF ]
(gdb)
Note that even another PTRACE_SINGLESTEP does not fix it:
(gdb) si
33 }
1: $eflags = [ PF ZF TF IF ]
(gdb)
Eventually, it gets "fixed" by the rt_sigreturn syscall, when
returning out of the handler:
(gdb) bt
#0 sigusr1_handler (sig=10) at si-handler.c:33
#1 <signal handler called>
#2 main () at si-handler.c:50
(gdb) set disassemble-next-line on
(gdb) si
0x0000000000400632 33 }
0x0000000000400631 <sigusr1_handler+17>: 5d pop %rbp
=> 0x0000000000400632 <sigusr1_handler+18>: c3 retq
1: $eflags = [ PF ZF TF IF ]
(gdb)
<signal handler called>
=> 0x0000003b36a358f0 <__restore_rt+0>: 48 c7 c0 0f 00 00 00 mov $0xf,%rax
1: $eflags = [ PF ZF TF IF ]
(gdb) si
<signal handler called>
=> 0x0000003b36a358f7 <__restore_rt+7>: 0f 05 syscall
1: $eflags = [ PF ZF TF IF ]
(gdb)
main () at si-handler.c:50
50 global = 0; /* set break here */
=> 0x000000000040066b <main+9>: c7 05 cb 09 20 00 00 00 00 00 movl $0x0,0x2009cb(%rip) # 0x601040 <global>
1: $eflags = [ PF ZF IF ]
(gdb)
The bug doesn't happen if we instead PTRACE_CONTINUE into the signal
handler -- e.g., set a breakpoint in the handler, queue a signal, and
"continue".
gdb/testsuite/
2014-10-28 Pedro Alves <palves@redhat.com>
PR gdb/17511
* gdb.base/sigstep.c (handler): Add a few more writes to 'done'.
* gdb.base/sigstep.exp (other_handler_location): New global.
(advance): Support stepping into the signal handler, and running
commands while in the handler.
(in_handler_map): New global.
(top level): In the advance test, add combinations for getting
into the handler with stepping commands, and for running commands
in the handler. Add comment descripting the advancei tests.
I noticed that "si" behaves differently when a "handle nostop" signal
arrives while the step is in progress, depending on whether the
program was stopped at a breakpoint when "si" was entered.
Specifically, in case GDB needs to step off a breakpoint, the handler
is skipped and the program stops in the next "mainline" instruction.
Otherwise, the "si" stops in the first instruction of the signal
handler.
I was surprised the testsuite doesn't catch this difference. Turns
out gdb.base/sigstep.exp covers a bunch of cases related to stepping
and signal handlers, but does not test stepi nor nexti, only
step/next/continue.
My first reaction was that stopping in the signal handler was the
correct thing to do, as it's where the next user-visible instruction
that is executed is. I considered then "nexti" -- a signal handler
could be reasonably considered a subroutine call to step over, it'd
seem intuitive to me that "nexti" would skip it.
But then, I realized that signals that arrive while a plain/line
"step" is in progress _also_ have their handler skipped. A user might
well be excused for being confused by this, given:
(gdb) help step
Step program until it reaches a different source line.
And the signal handler's sources will be in different source lines,
after all.
I think that having to explain that "stepi" steps into handlers, (and
that "nexti" wouldn't according to my reasoning above), while "step"
does not, is a sign of an awkward interface.
E.g., if a user truly is interested in stepping into signal handlers,
then it's odd that she has to either force the signal to "handle
stop", or recall to do "stepi" whenever such a signal might be
delivered. For that use case, it'd seem nicer to me if "step" also
stepped into handlers.
This suggests to me that we either need a global "step-into-handlers"
setting, or perhaps better, make "handle pass/nopass stop/nostop
print/noprint" have have an additional axis - "handle
stepinto/nostepinto", so that the user could configure whether
handlers for specific signals should be stepped into.
In any case, I think it's simpler (and thus better) for all step
commands to behave the same. This commit thus makes "si/ni" skip
handlers for "handle nostop" signals that arrive while the command was
already in progress, like step/next do.
To be clear, nothing changes if the program was stopped for a signal,
and the user enters a stepping command _then_ -- GDB still steps into
the handler. The change concerns signals that don't cause a stop and
that arrive while the step is in progress.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/
2014-10-27 Pedro Alves <palves@redhat.com>
* infrun.c (handle_signal_stop): Also skip handlers when a random
signal arrives while handling a "stepi" or a "nexti". Set the
thread's 'step_after_step_resume_breakpoint' flag.
gdb/doc/
2014-10-27 Pedro Alves <palves@redhat.com>
* gdb.texinfo (Continuing and Stepping): Add cross reference to
info on stepping and signal handlers.
(Signals): Explain stepping and signal handlers. Add context
index entry, and cross references.
gdb/testsuite/
2014-10-27 Pedro Alves <palves@redhat.com>
* gdb.base/sigstep.c (dummy): New global.
(main): Issue a couple writes to the new global.
* gdb.base/sigstep.exp (get_next_pc, test_skip_handler): New
procedures.
(skip_over_handler): Use test_skip_handler.
(top level): Call skip_over_handler for stepi and nexti too.
(breakpoint_over_handler): Use test_skip_handler.
(top level): Call breakpoint_over_handler for stepi and nexti too.
Two modifications:
1. The addition of 2013 to the copyright year range for every file;
2. The use of a single year range, instead of potentially multiple
year ranges, as approved by the FSF.
* gdb.base/sigstep.c (main): Add checks for
return values for setitimer call.
Call setitimer again with itimer = ITIMER_REAL
if first call to setitimer fails.
* gdb.base/sigstep.exp (breakpoint_to_handler, skip_to_handler)
(skip_over_handler, breakpoint_over_hander): New test procedures.
(advance, advancei): Add a proper prefix, do not use
rerun_to_main.
* gdb.base/sigstep.c (main): Change to use an infinite loop.
* gdb.base/freebpcmd.c: Include <stdio.h>.
* gdb.base/long_long.c: Include <string.h>.
* gdb.base/sigaltstack.c: Include <stdlib.h> <string.h>.
* gdb.base/siginfo.c: Include <string.h>.
* gdb.base/sigstep.c: Include <string.h>.