Even with the previous patch installed, we'll still see
sigall-reverse.exp occasionally fail. The problem is that the event
loop's event handling processing is done in two steps:
#1 - poll all event sources, and push new event objects to the event
queue, until all event sources are drained.
#2 - go through the event queue, processing each event object at a
time. For each event, call the associated callback, and deletes the
event object from the queue.
and then bad things happen if between #1 and #2 something decides that
events from an event source that has already queued events shouldn't
be processed yet. To do that, we either remove the event source from
the list of event sources, or clear its "have events" flag. However,
if an event for that source has meanwhile already been pushed in the
event queue, #2 will still process it and call the associated
callback...
One way to fix it that I considered was to do something to the event
objects already in the event queue when an event source is no longer
interesting. But then I couldn't find any good reason for the
two-step process in the first place. It's much simpler (and less
code) to call the event source callbacks as we poll the sources and
find events.
Tested on x86-64 Fedora 20, native and gdbserver.
gdb/
2015-02-03 Pedro Alves <palves@redhat.com>
* event-loop.c: Don't declare nor define a queue type for
gdb_event_p.
(event_queue): Delete.
(create_event, create_file_event, gdb_event_xfree)
(initialize_event_loop, process_event): Delete.
(gdb_do_one_event): Return as soon as one event is handled.
(handle_file_event): Change prototype. Used the passed in
file_handler pointer and ready_mask instead of looping over all
file handlers.
(gdb_wait_for_event): Update the poll/select timeouts before
blocking. Run event handlers directly instead of queueing events.
Return as soon as one event is handled.
(struct async_event_handler_data): Delete.
(invoke_async_event_handler): Delete.
(check_async_event_handlers): Change return type to int. Run
event handlers directly instead of queueing events. Return as
soon as one event is handled.
(handle_timer_event): Delete.
(update_wait_timeout): New function, factored out from
poll_timers.
(poll_timers): Reimplement.
* event-loop.h (initialize_event_loop): Delete declaration.
* top.c (gdb_init): Don't call initialize_event_loop.
The sigall-reverse.exp test occasionally fails with something like this:
(gdb) PASS: gdb.reverse/sigall-reverse.exp: send signal TERM
continue
Continuing.
The next instruction is syscall exit_group. It will make the program exit. Do you want to stop the program?([y] or n) FAIL: gdb.reverse/sigall-reverse.exp: continue to signal exit (timeout)
FAIL: gdb.reverse/sigall-reverse.exp: reverse to handler of TERM (timeout)
FAIL: gdb.reverse/sigall-reverse.exp: reverse to gen_TERM (timeout)
This is another event-loop/async related problem exposed by the patch
that made 'query' use gdb_readline_wrapper (588dcc3edb).
The problem is that even though gdb_readline_wrapper disables
target-async while the secondary prompt is in progress, the record
target's async event source is left marked. So when
gdb_readline_wrapper nests an event loop to process input, it may
happen that that event loop ends up processing a target event while
GDB is not really ready for it. Here's the relevant part of the
backtrace showing the root issue in action:
...
#14 0x000000000061cb48 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:4158
#15 0x0000000000642917 in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:57
#16 0x000000000077ca5c in record_full_async_inferior_event_handler (data=0x0) at src/gdb/record-full.c:791
#17 0x0000000000640fdf in invoke_async_event_handler (data=...) at src/gdb/event-loop.c:1067
#18 0x000000000063fb01 in process_event () at src/gdb/event-loop.c:339
#19 0x000000000063fb2a in gdb_do_one_event () at src/gdb/event-loop.c:360
#20 0x000000000074d607 in gdb_readline_wrapper (prompt=0x3588f40 "The next instruction is syscall exit_group. It will make the program exit. Do you want to stop the program?([y] or n) ") at src/gdb/top.c:842
#21 0x0000000000750bd9 in defaulted_query (ctlstr=0x8c6588 "The next instruction is syscall exit_group. It will make the program exit. Do you want to stop the program?", defchar=121 'y', args=0x7fff70524410) at src/gdb/utils.c:1279
#22 0x0000000000750e4c in yquery (ctlstr=0x8c6588 "The next instruction is syscall exit_group. It will make the program exit. Do you want to stop the program?") at src/gdb/utils.c:1358
#23 0x00000000004b020e in record_linux_system_call (syscall=gdb_sys_exit_group, regcache=0x3529450, tdep=0xd6c840 <amd64_linux_record_tdep>) at src/gdb/linux-record.c:1933
With my all-stop-on-top-of-non-stop series, I'm also seeing
gdb.server/ext-attach.exp fail occasionally due to the same issue.
The first part of the fix is for target_async implementations to make
sure to remove/unmark all target-related event sources from the event
loop.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/
2015-02-03 Pedro Alves <palves@redhat.com>
* event-loop.c (clear_async_event_handler): New function.
* event-loop.h (clear_async_event_handler): New declaration.
* record-btrace.c (record_btrace_async): New function.
(init_record_btrace_ops): Install record_btrace_async.
* record-full.c (record_full_async): New function.
(record_full_resume): Don't mark the async event source here.
(init_record_full_ops): Install record_full_async.
(record_full_core_resume): Don't mark the async event source here.
(init_record_full_core_ops): Install record_full_async.
* remote.c (remote_async): Mark and clear the async stop reply
queue event-loop token as appropriate.
Two modifications:
1. The addition of 2013 to the copyright year range for every file;
2. The use of a single year range, instead of potentially multiple
year ranges, as approved by the FSF.
* event-loop.c: ... here.
* tui/tui-io.c (tui_readline_output): Rename parameter `code' to
`error' for clarity.
(tui_getc): Pass correct value for `error' parameter to
tui_readline_output.
(async_event_handler): Forward declare.
(async_event_handler_func): New typedef.
(create_async_event_handler, delete_async_event_handler)
(mark_async_event_handler): Declare.
* event-loop.c (event_data): New.
(event_handler_func): Take an event_data instead of an integer.
(struct gdb_event): Replace the integer file descriptor by a
generic event_data.
(async_event_handler): New.
(async_handler_ready): Delete.
(async_event_handler_list): New.
(create_event): New.
(create_file_event): Use it.
(process_event): Adjust.
(gdb_do_one_event): Poll from the event sources in round-robin
fashion across calls. Be sure to consult all sources before
blocking.
(handle_file_event): Take an event_data instead of an integer.
Adjust.
(gdb_wait_for_event): Add `block' argument. Handle it.
(mark_async_signal_handler): Remove unneeded cast.
(invoke_async_signal_handler): Rename to ...
(invoke_async_signal_handlres): ... this. Return true if any was
handled.
(check_async_ready): Delete
(create_async_event_handler): New.
(mark_async_event_handler): New.
(struct async_event_handler_data): New.
(invoke_async_event_handler): New.
(check_async_event_handlers): New.
(delete_async_event_handler): New.
(handle_timer_event): Adjust.
* defs.h (struct continuation_arg): Change type of field 'data'
from PTR to void *.
* event-loop.h: Eliminate uses of PTR, use 'void *' instead.
* event-top.c: Ditto.