Keith reported that gdb.base/dso2dso.exp is broken, with the following
error:
| $ make check RUNTESTFLAGS=dso2dso.exp
| [snip]
| Running ../../../src/gdb/testsuite/gdb.base/dso2dso.exp ...
| ERROR: tcl error sourcing ../../../src/gdb/testsuite/gdb.base/dso2dso.exp.
| ERROR: couldn't open
| "../../../src/gdb/testsuite/gdb.base/../../../src/gdb/testsuite/gdb.base/dso2dso-dso1.c":
| no such file or directory
| while executing
| "error "$message""
| (procedure "gdb_get_line_number" line 14)
| invoked from within
| "gdb_get_line_number "STOP HERE" $srcfile_libdso1"
| (file "../../../src/gdb/testsuite/gdb.base/dso2dso.exp" line 60)
| invoked from within
| "source ../../../src/gdb/testsuite/gdb.base/dso2dso.exp"
| ("uplevel" body line 1)
| invoked from within
| "uplevel #0 source ../../../src/gdb/testsuite/gdb.base/dso2dso.exp"
| invoked from within
| "catch "uplevel #0 source $test_file_name""
This happens because gdb_get_line_number will prepend $srcdir/$subdir
if the given filename does not start with "/", and this happens when
GDB was configured using a relative path to the configure script.
When using an absolute path like I do, we avoid the pre-pending that
Keith is seeing.
gdb/testsuite/ChangeLog:
Keith Seitz <keiths@redhat.com>:
* gdb.base/dso2dso.exp: Pass basename of source file in call
to gdb_get_line_number.
Tested on x86_64-linux with both scenarios.
Making all-stop run on top of non-stop caused a small regression
in behavior. This was observed on x86_64-linux. The attached testcase
is in C whereas the investigation was done with an Ada program,
but it's the same scenario, and using a C testcase allows wider testing.
Basically: I am debugging a single-threaded program, and currently
stopped inside a function provided by a shared-library, at a line
calling a subprogram provided by a second shared library, and trying
to "next" over that function call.
Before we changed the default all-stop behavior, we had:
7 Impl_Initialize; -- Stop here and try "next" over this line
(gdb) n
8 return 5; <<-- OK
But now, "next" just stops much earlier:
(gdb) n
0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so
What happens is that next stops at a call instruction, which calls
the function's PLT, and GDB fails to notice that the inferior stepped
into a subroutine, and so decides that we're done. We can see another
symptom of the same issue by looking at the backtrace at the point
GDB stopped:
(gdb) bt
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00000000f7bd86f9 in ?? ()
#2 0x00007fffffffdf50 in ?? ()
#3 0x0000000000401893 in a () at /[...]/a.adb:7
Backtrace stopped: frame did not save the PC
With a functioning GDB, the backtrace looks like the following instead:
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7
#2 0x0000000000401893 in a () at /[...]/a.adb:7
Note how, for frame #1, the address looks quite similar, except
for the high-order bits not being set:
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7 <<<-- OK
#1 0x00000000f7bd86f9 in ?? () <<<-- WRONG
^^^^
||||
Wrong
Investigating this further led me to displaced stepping.
As we are "next"-ing from a location where a breakpoint is inserted,
we need to step out of it, and since we're on non-stop mode, we need
to do it using displaced stepping. And looking at
amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles
the return address:
regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp);
retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order);
retaddr = (retaddr - insn_offset) & 0xffffffffUL;
The mask used to compute retaddr looks wrong to me, keeping only
4 bytes instead of 8, and explains why the high order bits of
the backtrace are unset. What happens is that, after the displaced
stepping has completed, GDB restores that return address at the location
where the program expects it. But because the top half bits of
the address have been masked out, the return address is now invalid.
The incorrect behavior of the "next" command and the backtrace at
that location are the first symptoms of that. Another symptom is
that this actually alters the behavior of the program, where a "cont"
from there soon leads to a SEGV when the inferior tries to jump back
to that incorrect return address:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000f7bd86f9 in ?? ()
^^^^^^^^^^^^^^^^^^
This patch fixes the issue by using a mask that seems more appropriate
for this architecture.
gdb/ChangeLog:
* amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to
compute RETADDR.
gdb/testsuite/ChangeLog:
* gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h,
gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c,
gdb.base/dso2dso.exp: New files.
Tested on x86_64-linux, no regression.