[amd64] Invalid return address after displaced stepping
Making all-stop run on top of non-stop caused a small regression
in behavior. This was observed on x86_64-linux. The attached testcase
is in C whereas the investigation was done with an Ada program,
but it's the same scenario, and using a C testcase allows wider testing.
Basically: I am debugging a single-threaded program, and currently
stopped inside a function provided by a shared-library, at a line
calling a subprogram provided by a second shared library, and trying
to "next" over that function call.
Before we changed the default all-stop behavior, we had:
7 Impl_Initialize; -- Stop here and try "next" over this line
(gdb) n
8 return 5; <<-- OK
But now, "next" just stops much earlier:
(gdb) n
0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so
What happens is that next stops at a call instruction, which calls
the function's PLT, and GDB fails to notice that the inferior stepped
into a subroutine, and so decides that we're done. We can see another
symptom of the same issue by looking at the backtrace at the point
GDB stopped:
(gdb) bt
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00000000f7bd86f9 in ?? ()
#2 0x00007fffffffdf50 in ?? ()
#3 0x0000000000401893 in a () at /[...]/a.adb:7
Backtrace stopped: frame did not save the PC
With a functioning GDB, the backtrace looks like the following instead:
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7
#2 0x0000000000401893 in a () at /[...]/a.adb:7
Note how, for frame #1, the address looks quite similar, except
for the high-order bits not being set:
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7 <<<-- OK
#1 0x00000000f7bd86f9 in ?? () <<<-- WRONG
^^^^
||||
Wrong
Investigating this further led me to displaced stepping.
As we are "next"-ing from a location where a breakpoint is inserted,
we need to step out of it, and since we're on non-stop mode, we need
to do it using displaced stepping. And looking at
amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles
the return address:
regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp);
retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order);
retaddr = (retaddr - insn_offset) & 0xffffffffUL;
The mask used to compute retaddr looks wrong to me, keeping only
4 bytes instead of 8, and explains why the high order bits of
the backtrace are unset. What happens is that, after the displaced
stepping has completed, GDB restores that return address at the location
where the program expects it. But because the top half bits of
the address have been masked out, the return address is now invalid.
The incorrect behavior of the "next" command and the backtrace at
that location are the first symptoms of that. Another symptom is
that this actually alters the behavior of the program, where a "cont"
from there soon leads to a SEGV when the inferior tries to jump back
to that incorrect return address:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000f7bd86f9 in ?? ()
^^^^^^^^^^^^^^^^^^
This patch fixes the issue by using a mask that seems more appropriate
for this architecture.
gdb/ChangeLog:
* amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to
compute RETADDR.
gdb/testsuite/ChangeLog:
* gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h,
gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c,
gdb.base/dso2dso.exp: New files.
Tested on x86_64-linux, no regression.
2015-08-12 16:33:19 +00:00
|
|
|
# Copyright 2015 Free Software Foundation, Inc.
|
|
|
|
|
|
|
|
# This program is free software; you can redistribute it and/or modify
|
|
|
|
# it under the terms of the GNU General Public License as published by
|
|
|
|
# the Free Software Foundation; either version 3 of the License, or
|
|
|
|
# (at your option) any later version.
|
|
|
|
#
|
|
|
|
# This program is distributed in the hope that it will be useful,
|
|
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
# GNU General Public License for more details.
|
|
|
|
#
|
|
|
|
# You should have received a copy of the GNU General Public License
|
|
|
|
# along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
2015-08-12 20:40:54 +00:00
|
|
|
# The purpose of this testcase is to verify that we can "next" over
|
|
|
|
# a call to a function provided by one shared library made from another
|
|
|
|
# shared library, and that GDB stops at the expected location. In this
|
|
|
|
# case, the call is made from sub1 (provided by libdso1) and we are
|
|
|
|
# calling sub2 (provided by libdso2).
|
|
|
|
#
|
|
|
|
# Note that, while this is not the main purpose of this testcase, it
|
|
|
|
# also happens to exercise an issue with displaced stepping on amd64
|
|
|
|
# when libdso1 is mapped at an address greater than 0xffffffff.
|
|
|
|
|
[amd64] Invalid return address after displaced stepping
Making all-stop run on top of non-stop caused a small regression
in behavior. This was observed on x86_64-linux. The attached testcase
is in C whereas the investigation was done with an Ada program,
but it's the same scenario, and using a C testcase allows wider testing.
Basically: I am debugging a single-threaded program, and currently
stopped inside a function provided by a shared-library, at a line
calling a subprogram provided by a second shared library, and trying
to "next" over that function call.
Before we changed the default all-stop behavior, we had:
7 Impl_Initialize; -- Stop here and try "next" over this line
(gdb) n
8 return 5; <<-- OK
But now, "next" just stops much earlier:
(gdb) n
0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so
What happens is that next stops at a call instruction, which calls
the function's PLT, and GDB fails to notice that the inferior stepped
into a subroutine, and so decides that we're done. We can see another
symptom of the same issue by looking at the backtrace at the point
GDB stopped:
(gdb) bt
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00000000f7bd86f9 in ?? ()
#2 0x00007fffffffdf50 in ?? ()
#3 0x0000000000401893 in a () at /[...]/a.adb:7
Backtrace stopped: frame did not save the PC
With a functioning GDB, the backtrace looks like the following instead:
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7
#2 0x0000000000401893 in a () at /[...]/a.adb:7
Note how, for frame #1, the address looks quite similar, except
for the high-order bits not being set:
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7 <<<-- OK
#1 0x00000000f7bd86f9 in ?? () <<<-- WRONG
^^^^
||||
Wrong
Investigating this further led me to displaced stepping.
As we are "next"-ing from a location where a breakpoint is inserted,
we need to step out of it, and since we're on non-stop mode, we need
to do it using displaced stepping. And looking at
amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles
the return address:
regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp);
retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order);
retaddr = (retaddr - insn_offset) & 0xffffffffUL;
The mask used to compute retaddr looks wrong to me, keeping only
4 bytes instead of 8, and explains why the high order bits of
the backtrace are unset. What happens is that, after the displaced
stepping has completed, GDB restores that return address at the location
where the program expects it. But because the top half bits of
the address have been masked out, the return address is now invalid.
The incorrect behavior of the "next" command and the backtrace at
that location are the first symptoms of that. Another symptom is
that this actually alters the behavior of the program, where a "cont"
from there soon leads to a SEGV when the inferior tries to jump back
to that incorrect return address:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000f7bd86f9 in ?? ()
^^^^^^^^^^^^^^^^^^
This patch fixes the issue by using a mask that seems more appropriate
for this architecture.
gdb/ChangeLog:
* amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to
compute RETADDR.
gdb/testsuite/ChangeLog:
* gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h,
gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c,
gdb.base/dso2dso.exp: New files.
Tested on x86_64-linux, no regression.
2015-08-12 16:33:19 +00:00
|
|
|
if { [skip_shlib_tests] } {
|
|
|
|
return 0
|
|
|
|
}
|
|
|
|
|
|
|
|
standard_testfile
|
|
|
|
|
|
|
|
set output_dir [standard_output_file {}]
|
|
|
|
|
|
|
|
set libdso2 $testfile-dso2
|
|
|
|
set srcfile_libdso2 $srcdir/$subdir/$libdso2.c
|
|
|
|
set binfile_libdso2 [standard_output_file $libdso2.so]
|
|
|
|
|
|
|
|
set libdso1 $testfile-dso1
|
|
|
|
set srcfile_libdso1 $srcdir/$subdir/$libdso1.c
|
|
|
|
set binfile_libdso1 [standard_output_file $libdso1.so]
|
|
|
|
|
|
|
|
if { [gdb_compile_shlib $srcfile_libdso2 $binfile_libdso2 \
|
|
|
|
[list debug additional_flags=-fPIC]] != "" } {
|
|
|
|
untested "Could not compile $binfile_libdso2."
|
|
|
|
return -1
|
|
|
|
}
|
|
|
|
|
|
|
|
if { [gdb_compile_shlib $srcfile_libdso1 $binfile_libdso1 \
|
|
|
|
[list debug additional_flags=-fPIC]] != "" } {
|
|
|
|
untested "Could not compile $binfile_libdso1."
|
|
|
|
return -1
|
|
|
|
}
|
|
|
|
|
|
|
|
if { [gdb_compile $srcdir/$subdir/$srcfile $binfile executable \
|
|
|
|
[list debug shlib=$binfile_libdso2 shlib=$binfile_libdso1]] != "" } {
|
|
|
|
return -1
|
|
|
|
}
|
|
|
|
|
|
|
|
clean_restart $binfile
|
|
|
|
gdb_load_shlibs $binfile_libdso2 $binfile_libdso1
|
|
|
|
|
|
|
|
if { ![runto_main] } {
|
|
|
|
return -1
|
|
|
|
}
|
|
|
|
|
2015-08-13 01:31:11 +00:00
|
|
|
set bp_location [gdb_get_line_number "STOP HERE" [file tail $srcfile_libdso1]]
|
[amd64] Invalid return address after displaced stepping
Making all-stop run on top of non-stop caused a small regression
in behavior. This was observed on x86_64-linux. The attached testcase
is in C whereas the investigation was done with an Ada program,
but it's the same scenario, and using a C testcase allows wider testing.
Basically: I am debugging a single-threaded program, and currently
stopped inside a function provided by a shared-library, at a line
calling a subprogram provided by a second shared library, and trying
to "next" over that function call.
Before we changed the default all-stop behavior, we had:
7 Impl_Initialize; -- Stop here and try "next" over this line
(gdb) n
8 return 5; <<-- OK
But now, "next" just stops much earlier:
(gdb) n
0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so
What happens is that next stops at a call instruction, which calls
the function's PLT, and GDB fails to notice that the inferior stepped
into a subroutine, and so decides that we're done. We can see another
symptom of the same issue by looking at the backtrace at the point
GDB stopped:
(gdb) bt
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00000000f7bd86f9 in ?? ()
#2 0x00007fffffffdf50 in ?? ()
#3 0x0000000000401893 in a () at /[...]/a.adb:7
Backtrace stopped: frame did not save the PC
With a functioning GDB, the backtrace looks like the following instead:
#0 0x00007ffff7bd8560 in impl.initialize@plt ()
from /[...]/lib/libpck.so
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7
#2 0x0000000000401893 in a () at /[...]/a.adb:7
Note how, for frame #1, the address looks quite similar, except
for the high-order bits not being set:
#1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7 <<<-- OK
#1 0x00000000f7bd86f9 in ?? () <<<-- WRONG
^^^^
||||
Wrong
Investigating this further led me to displaced stepping.
As we are "next"-ing from a location where a breakpoint is inserted,
we need to step out of it, and since we're on non-stop mode, we need
to do it using displaced stepping. And looking at
amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles
the return address:
regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp);
retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order);
retaddr = (retaddr - insn_offset) & 0xffffffffUL;
The mask used to compute retaddr looks wrong to me, keeping only
4 bytes instead of 8, and explains why the high order bits of
the backtrace are unset. What happens is that, after the displaced
stepping has completed, GDB restores that return address at the location
where the program expects it. But because the top half bits of
the address have been masked out, the return address is now invalid.
The incorrect behavior of the "next" command and the backtrace at
that location are the first symptoms of that. Another symptom is
that this actually alters the behavior of the program, where a "cont"
from there soon leads to a SEGV when the inferior tries to jump back
to that incorrect return address:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000f7bd86f9 in ?? ()
^^^^^^^^^^^^^^^^^^
This patch fixes the issue by using a mask that seems more appropriate
for this architecture.
gdb/ChangeLog:
* amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to
compute RETADDR.
gdb/testsuite/ChangeLog:
* gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h,
gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c,
gdb.base/dso2dso.exp: New files.
Tested on x86_64-linux, no regression.
2015-08-12 16:33:19 +00:00
|
|
|
gdb_breakpoint ${srcfile_libdso1}:${bp_location}
|
|
|
|
|
|
|
|
gdb_continue_to_breakpoint "at call to sub2" \
|
|
|
|
".*sub2 ().*"
|
|
|
|
|
|
|
|
gdb_test "next" \
|
|
|
|
".*return 5;.*" \
|
|
|
|
"next over call to sub2"
|