1999-04-16 01:35:26 +00:00
|
|
|
# Makefile for regression testing the GNU debugger.
|
2016-01-01 04:33:14 +00:00
|
|
|
# Copyright 1992-2016 Free Software Foundation, Inc.
|
1999-04-16 01:35:26 +00:00
|
|
|
|
|
|
|
# This file is part of GDB.
|
|
|
|
|
2009-12-20 11:55:25 +00:00
|
|
|
# This program is free software; you can redistribute it and/or modify
|
1999-04-16 01:35:26 +00:00
|
|
|
# it under the terms of the GNU General Public License as published by
|
2009-12-20 11:55:25 +00:00
|
|
|
# the Free Software Foundation; either version 3 of the License, or
|
|
|
|
# (at your option) any later version.
|
|
|
|
#
|
|
|
|
# This program is distributed in the hope that it will be useful,
|
1999-04-16 01:35:26 +00:00
|
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
# GNU General Public License for more details.
|
2009-12-20 11:55:25 +00:00
|
|
|
#
|
1999-04-16 01:35:26 +00:00
|
|
|
# You should have received a copy of the GNU General Public License
|
2009-12-20 11:55:25 +00:00
|
|
|
# along with this program. If not, see <http://www.gnu.org/licenses/>.
|
1999-04-16 01:35:26 +00:00
|
|
|
|
|
|
|
VPATH = @srcdir@
|
|
|
|
srcdir = @srcdir@
|
|
|
|
prefix = @prefix@
|
|
|
|
exec_prefix = @exec_prefix@
|
2009-11-09 17:57:34 +00:00
|
|
|
abs_builddir = @abs_builddir@
|
Fix in-tree, parallel running of Ada tests
While testing the following patch,
[PATCH] Always organize test artifacts in a directory hierarchy
https://sourceware.org/ml/gdb-patches/2016-01/msg00133.html
I noticed that it broke Ada testing. This lead me to think that
parallel testing when building in-tree didn't work previously in Ada.
It is confirmed by this test:
$ make check TESTS="gdb.ada/fun_addr.exp" -j 2
...
Running ./gdb.ada/fun_addr.exp ...
FAIL: gdb.ada/fun_addr.exp: compilation foo.adb
...
This patch fixes in-tree parallel testing for Ada, and consequently
serial and parallel testing when the aforementioned patch is applied.
The problem originates from the fact that Ada support code cd's to the
builddir before compiling. In itself it's not a problem, it allows to
place intermediate auto-generated files in that directory. The Ada
compilation refers to the source file, which is in another directory,
only by its base name (e.g. foo.adb). In serial mode, that worked
because builddir was the same as the source directory (e.g.
gdb.ada/fun_addr/). In an out-of-tree build, it works because the
source directory is added as an include directory (note: this is not the
same $srcdir as autoconf's):
set srcdir [file dirname $source]
additional_flags=-I$srcdir
which becomes:
additional_flags=-I/home/emaisin/build/binutils-gdb/gdb/testsuite/gdb.ada/fun_addr
However, when building in-tree, srcdir is relative: ./gdb.ada/fun_addr.
When using parallel or always-in-outputs-directory mode, we are cd'ed in
the outputs directory. So -I$srcdir is relative to the current
directory, which is wrong.
To fix it, I made the TCL variable srcdir (set in site.exp, from which
everything else is derived) always absolute. It is done by assigning
autoconf's abs_srcdir instead of autoconf's srcdir. This way -I$srcdir
will always be good, regardless of where we cd'ed to. A small apparent
change is that when running tests, DejaGnu will say:
Running /tmp/binutils-gdb/gdb/testsuite/gdb.ada/fun_addr.exp ...
instead of
Running ./gdb.ada/fun_addr.exp ...
I hope it's not too much of an annoyance. I think that it should make
the testsuite a tiny bit more robust against other bugs of the same
class.
Regtested in & out of tree, only with native target.
gdb/testsuite/ChangeLog:
* Makefile.in (abs_srcdir): Assign @abs_srcdir@.
(site.exp): Assign abs_srcdir to tcl's srcdir.
2016-02-08 19:00:49 +00:00
|
|
|
abs_srcdir = @abs_srcdir@
|
1999-04-16 01:35:26 +00:00
|
|
|
|
2006-06-21 13:57:21 +00:00
|
|
|
target_alias = @target_noncanonical@
|
1999-04-16 01:35:26 +00:00
|
|
|
program_transform_name = @program_transform_name@
|
|
|
|
build_canonical = @build@
|
|
|
|
host_canonical = @host@
|
|
|
|
target_canonical = @target@
|
|
|
|
|
|
|
|
SHELL = @SHELL@
|
2000-11-17 16:37:48 +00:00
|
|
|
EXEEXT = @EXEEXT@
|
1999-04-16 01:35:26 +00:00
|
|
|
SUBDIRS = @subdirs@
|
|
|
|
RPATH_ENVVAR = @RPATH_ENVVAR@
|
|
|
|
|
2014-08-20 17:55:54 +00:00
|
|
|
EXTRA_RULES = @EXTRA_RULES@
|
|
|
|
|
|
|
|
CC=@CC@
|
|
|
|
|
|
|
|
EXPECT = `if [ "$${READ1}" != "" ] ; then \
|
|
|
|
echo $${rootme}/expect-read1; \
|
|
|
|
elif [ -f $${rootme}/../../expect/expect ] ; then \
|
|
|
|
echo $${rootme}/../../expect/expect ; \
|
|
|
|
else \
|
|
|
|
echo expect ; \
|
|
|
|
fi`
|
1999-04-16 01:35:26 +00:00
|
|
|
|
|
|
|
RUNTEST = $(RUNTEST_FOR_TARGET)
|
|
|
|
|
|
|
|
RUNTESTFLAGS =
|
|
|
|
|
2009-06-29 16:41:45 +00:00
|
|
|
FORCE_PARALLEL =
|
|
|
|
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
# Default number of iterations that we will use to run the testsuite
|
|
|
|
# if the user does not specify the RACY_ITER environment variable
|
|
|
|
# (e.g., when the user calls the make rule directly from the command
|
|
|
|
# line).
|
|
|
|
DEFAULT_RACY_ITER = 3
|
|
|
|
|
1999-04-16 01:35:26 +00:00
|
|
|
RUNTEST_FOR_TARGET = `\
|
|
|
|
if [ -f $${srcdir}/../../dejagnu/runtest ]; then \
|
|
|
|
echo $${srcdir}/../../dejagnu/runtest; \
|
|
|
|
else \
|
|
|
|
if [ "$(host_canonical)" = "$(target_canonical)" ]; then \
|
|
|
|
echo runtest; \
|
|
|
|
else \
|
2007-03-02 18:42:40 +00:00
|
|
|
t='$(program_transform_name)'; echo runtest | sed -e $$t; \
|
1999-04-16 01:35:26 +00:00
|
|
|
fi; \
|
|
|
|
fi`
|
|
|
|
|
|
|
|
#### host, target, and site specific Makefile frags come in here.
|
|
|
|
|
|
|
|
# The use of $$(x_FOR_TARGET) reduces the command line length by not
|
|
|
|
# duplicating the lengthy definition.
|
|
|
|
|
|
|
|
TARGET_FLAGS_TO_PASS = \
|
|
|
|
"prefix=$(prefix)" \
|
|
|
|
"exec_prefix=$(exec_prefix)" \
|
|
|
|
"against=$(against)" \
|
|
|
|
'CC=$$(CC_FOR_TARGET)' \
|
|
|
|
"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
|
|
|
|
"CFLAGS=$(TESTSUITE_CFLAGS)" \
|
|
|
|
'CXX=$$(CXX_FOR_TARGET)' \
|
|
|
|
"CXX_FOR_TARGET=$(CXX_FOR_TARGET)" \
|
|
|
|
"CXXFLAGS=$(CXXFLAGS)" \
|
|
|
|
"MAKEINFO=$(MAKEINFO)" \
|
|
|
|
"INSTALL=$(INSTALL)" \
|
|
|
|
"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
|
|
|
|
"INSTALL_DATA=$(INSTALL_DATA)" \
|
|
|
|
"LDFLAGS=$(LDFLAGS)" \
|
|
|
|
"LIBS=$(LIBS)" \
|
|
|
|
"RUNTEST=$(RUNTEST)" \
|
|
|
|
"RUNTESTFLAGS=$(RUNTESTFLAGS)"
|
|
|
|
|
2014-08-20 17:55:54 +00:00
|
|
|
all: $(EXTRA_RULES)
|
1999-04-16 01:35:26 +00:00
|
|
|
@echo "Nothing to be done for all..."
|
|
|
|
|
|
|
|
.NOEXPORT:
|
|
|
|
INFODIRS=doc
|
|
|
|
info:
|
|
|
|
install-info:
|
|
|
|
dvi:
|
2007-03-27 18:09:36 +00:00
|
|
|
pdf:
|
|
|
|
install-pdf:
|
1999-06-28 16:06:02 +00:00
|
|
|
html:
|
|
|
|
install-html:
|
1999-04-16 01:35:26 +00:00
|
|
|
|
|
|
|
install:
|
|
|
|
|
|
|
|
uninstall: force
|
|
|
|
|
2009-11-09 17:57:34 +00:00
|
|
|
# Use absolute `site.exp' path everywhere to suppress VPATH lookups for it.
|
|
|
|
# Bare `site.exp' is used as a target here if user requests it explicitly.
|
|
|
|
# $(RUNTEST) is looking up `site.exp' only in the current directory.
|
|
|
|
|
|
|
|
$(abs_builddir)/site.exp site.exp: ./config.status Makefile
|
1999-04-16 01:35:26 +00:00
|
|
|
@echo "Making a new config file..."
|
|
|
|
-@rm -f ./tmp?
|
|
|
|
@touch site.exp
|
|
|
|
-@mv site.exp site.bak
|
|
|
|
@echo "## these variables are automatically generated by make ##" > ./tmp0
|
|
|
|
@echo "# Do not edit here. If you wish to override these values" >> ./tmp0
|
|
|
|
@echo "# add them to the last section" >> ./tmp0
|
|
|
|
@echo "set host_triplet ${host_canonical}" >> ./tmp0
|
|
|
|
@echo "set target_alias $(target_alias)" >> ./tmp0
|
|
|
|
@echo "set target_triplet ${target_canonical}" >> ./tmp0
|
|
|
|
@echo "set build_triplet ${build_canonical}" >> ./tmp0
|
Fix in-tree, parallel running of Ada tests
While testing the following patch,
[PATCH] Always organize test artifacts in a directory hierarchy
https://sourceware.org/ml/gdb-patches/2016-01/msg00133.html
I noticed that it broke Ada testing. This lead me to think that
parallel testing when building in-tree didn't work previously in Ada.
It is confirmed by this test:
$ make check TESTS="gdb.ada/fun_addr.exp" -j 2
...
Running ./gdb.ada/fun_addr.exp ...
FAIL: gdb.ada/fun_addr.exp: compilation foo.adb
...
This patch fixes in-tree parallel testing for Ada, and consequently
serial and parallel testing when the aforementioned patch is applied.
The problem originates from the fact that Ada support code cd's to the
builddir before compiling. In itself it's not a problem, it allows to
place intermediate auto-generated files in that directory. The Ada
compilation refers to the source file, which is in another directory,
only by its base name (e.g. foo.adb). In serial mode, that worked
because builddir was the same as the source directory (e.g.
gdb.ada/fun_addr/). In an out-of-tree build, it works because the
source directory is added as an include directory (note: this is not the
same $srcdir as autoconf's):
set srcdir [file dirname $source]
additional_flags=-I$srcdir
which becomes:
additional_flags=-I/home/emaisin/build/binutils-gdb/gdb/testsuite/gdb.ada/fun_addr
However, when building in-tree, srcdir is relative: ./gdb.ada/fun_addr.
When using parallel or always-in-outputs-directory mode, we are cd'ed in
the outputs directory. So -I$srcdir is relative to the current
directory, which is wrong.
To fix it, I made the TCL variable srcdir (set in site.exp, from which
everything else is derived) always absolute. It is done by assigning
autoconf's abs_srcdir instead of autoconf's srcdir. This way -I$srcdir
will always be good, regardless of where we cd'ed to. A small apparent
change is that when running tests, DejaGnu will say:
Running /tmp/binutils-gdb/gdb/testsuite/gdb.ada/fun_addr.exp ...
instead of
Running ./gdb.ada/fun_addr.exp ...
I hope it's not too much of an annoyance. I think that it should make
the testsuite a tiny bit more robust against other bugs of the same
class.
Regtested in & out of tree, only with native target.
gdb/testsuite/ChangeLog:
* Makefile.in (abs_srcdir): Assign @abs_srcdir@.
(site.exp): Assign abs_srcdir to tcl's srcdir.
2016-02-08 19:00:49 +00:00
|
|
|
@echo "set srcdir ${abs_srcdir}" >> ./tmp0
|
1999-04-16 01:35:26 +00:00
|
|
|
@echo "set tool gdb" >> ./tmp0
|
2012-04-17 17:43:11 +00:00
|
|
|
@echo 'source $${srcdir}/lib/append_gdb_boards_dir.exp' >> ./tmp0
|
1999-04-16 01:35:26 +00:00
|
|
|
@echo "## All variables above are generated by configure. Do Not Edit ##" >> ./tmp0
|
|
|
|
@cat ./tmp0 > site.exp
|
|
|
|
@cat site.bak | sed \
|
|
|
|
-e '1,/^## All variables above are.*##/ d' >> site.exp
|
|
|
|
-@rm -f ./tmp?
|
|
|
|
|
|
|
|
installcheck:
|
|
|
|
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
# See whether -j was given to make. Either it was given with no
|
|
|
|
# arguments, and appears as "j" in the first word, or it was given an
|
|
|
|
# argument and appears as "-j" in a separate word.
|
|
|
|
@GMAKE_TRUE@saw_dash_j = $(or $(findstring j,$(firstword $(MAKEFLAGS))),$(filter -j,$(MAKEFLAGS)))
|
|
|
|
|
|
|
|
# For GNU make, try to run the tests in parallel if any -j option is
|
|
|
|
# given. If RUNTESTFLAGS is not empty, then by default the tests will
|
|
|
|
# be serialized. This can be overridden by setting FORCE_PARALLEL to
|
|
|
|
# any non-empty value. For a non-GNU make, do not parallelize.
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
@GMAKE_TRUE@CHECK_TARGET_TMP = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
|
|
|
|
@GMAKE_TRUE@CHECK_TARGET = $(if $(RACY_ITER),$(addsuffix -racy,$(CHECK_TARGET_TMP)),$(CHECK_TARGET_TMP))
|
2009-06-29 16:41:45 +00:00
|
|
|
@GMAKE_FALSE@CHECK_TARGET = check-single
|
|
|
|
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
# Note that we must resort to a recursive make invocation here,
|
|
|
|
# because GNU make 3.82 has a bug preventing MAKEFLAGS from being used
|
|
|
|
# in conditions.
|
|
|
|
check: all $(abs_builddir)/site.exp
|
|
|
|
$(MAKE) $(CHECK_TARGET)
|
2009-06-29 16:41:45 +00:00
|
|
|
|
2014-08-20 17:55:54 +00:00
|
|
|
check-read1:
|
|
|
|
$(MAKE) READ1="1" check
|
|
|
|
|
2009-06-29 16:41:45 +00:00
|
|
|
# All the hair to invoke dejagnu. A given invocation can just append
|
|
|
|
# $(RUNTESTFLAGS)
|
|
|
|
DO_RUNTEST = \
|
1999-04-16 01:35:26 +00:00
|
|
|
rootme=`pwd`; export rootme; \
|
|
|
|
srcdir=${srcdir} ; export srcdir ; \
|
|
|
|
EXPECT=${EXPECT} ; export EXPECT ; \
|
2000-11-17 16:37:48 +00:00
|
|
|
EXEEXT=${EXEEXT} ; export EXEEXT ; \
|
1999-04-16 01:35:26 +00:00
|
|
|
$(RPATH_ENVVAR)=$$rootme/../../expect:$$rootme/../../libstdc++:$$rootme/../../tk/unix:$$rootme/../../tcl/unix:$$rootme/../../bfd:$$rootme/../../opcodes:$$$(RPATH_ENVVAR); \
|
|
|
|
export $(RPATH_ENVVAR); \
|
|
|
|
if [ -f $${rootme}/../../expect/expect ] ; then \
|
|
|
|
TCL_LIBRARY=$${srcdir}/../../tcl/library ; \
|
|
|
|
export TCL_LIBRARY ; fi ; \
|
2016-01-19 16:06:11 +00:00
|
|
|
$(RUNTEST) --status
|
2009-06-29 16:41:45 +00:00
|
|
|
|
2014-02-19 00:11:02 +00:00
|
|
|
# TESTS exists for the user to pass on the command line to easily
|
|
|
|
# say "Only run these tests." With check-single it's not necessary, but
|
|
|
|
# with check-parallel there's no other way to (easily) specify a subset
|
|
|
|
# of tests. For consistency we support it for check-single as well.
|
|
|
|
# To specify all tests in a subdirectory, use TESTS=gdb.subdir/*.exp.
|
|
|
|
# E.g., make check TESTS="gdb.server/*.exp gdb.threads/*.exp".
|
|
|
|
@GMAKE_TRUE@TESTS :=
|
|
|
|
@GMAKE_FALSE@TESTS =
|
|
|
|
|
|
|
|
@GMAKE_TRUE@ifeq ($(strip $(TESTS)),)
|
|
|
|
@GMAKE_TRUE@expanded_tests_or_none :=
|
|
|
|
@GMAKE_TRUE@else
|
|
|
|
@GMAKE_TRUE@expanded_tests := $(patsubst $(srcdir)/%,%,$(wildcard $(addprefix $(srcdir)/,$(TESTS))))
|
|
|
|
@GMAKE_TRUE@expanded_tests_or_none := $(or $(expanded_tests),no-matching-tests-found)
|
|
|
|
@GMAKE_TRUE@endif
|
|
|
|
@GMAKE_FALSE@expanded_tests_or_none = $(TESTS)
|
|
|
|
|
2014-12-09 20:50:03 +00:00
|
|
|
# Shorthand for running all the tests in a single directory.
|
|
|
|
@GMAKE_TRUE@check-gdb.%:
|
|
|
|
@GMAKE_TRUE@ $(MAKE) check TESTS="gdb.$*/*.exp"
|
|
|
|
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
check-single:
|
2016-01-19 16:06:11 +00:00
|
|
|
$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
|
2009-06-29 16:41:45 +00:00
|
|
|
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
check-single-racy:
|
|
|
|
-rm -rf cache racy_outputs temp
|
|
|
|
mkdir -p racy_outputs; \
|
|
|
|
racyiter="$(RACY_ITER)"; \
|
|
|
|
test "x$$racyiter" == "x" && \
|
|
|
|
racyiter=$(DEFAULT_RACY_ITER); \
|
|
|
|
if test $$racyiter -lt 2 ; then \
|
|
|
|
echo "RACY_ITER must be at least 2."; \
|
|
|
|
exit 1; \
|
|
|
|
fi; \
|
|
|
|
trap "exit" INT; \
|
|
|
|
for n in `seq $$racyiter` ; do \
|
|
|
|
mkdir -p racy_outputs/$$n; \
|
|
|
|
$(DO_RUNTEST) --outdir=racy_outputs/$$n $(RUNTESTFLAGS) \
|
|
|
|
$(expanded_tests_or_none); \
|
|
|
|
done; \
|
|
|
|
$(srcdir)/analyze-racy-logs.py \
|
|
|
|
`ls racy_outputs/*/gdb.sum` > racy.sum; \
|
|
|
|
sed -n '/=== gdb Summary ===/,$$ p' racy.sum
|
|
|
|
|
2009-06-29 16:41:45 +00:00
|
|
|
check-parallel:
|
2014-02-19 00:01:34 +00:00
|
|
|
-rm -rf cache outputs temp
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
$(MAKE) -k do-check-parallel; \
|
2016-01-18 19:07:10 +00:00
|
|
|
result=$$?; \
|
2009-06-29 16:41:45 +00:00
|
|
|
$(SHELL) $(srcdir)/dg-extract-results.sh \
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
`find outputs -name gdb.sum -print` > gdb.sum; \
|
2009-06-29 16:41:45 +00:00
|
|
|
$(SHELL) $(srcdir)/dg-extract-results.sh -L \
|
2016-01-18 19:07:10 +00:00
|
|
|
`find outputs -name gdb.log -print` > gdb.log; \
|
|
|
|
sed -n '/=== gdb Summary ===/,$$ p' gdb.sum; \
|
|
|
|
exit $$result
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
check-parallel-racy:
|
|
|
|
-rm -rf cache racy_outputs temp
|
|
|
|
racyiter="$(RACY_ITER)"; \
|
|
|
|
test "x$$racyiter" == "x" && \
|
|
|
|
racyiter=$(DEFAULT_RACY_ITER); \
|
|
|
|
if test $$racyiter -lt 2 ; then \
|
|
|
|
echo "RACY_ITER must be at least 2."; \
|
|
|
|
exit 1; \
|
|
|
|
fi; \
|
|
|
|
trap "exit" INT; \
|
|
|
|
for n in `seq $$racyiter` ; do \
|
|
|
|
$(MAKE) -k do-check-parallel-racy \
|
|
|
|
RACY_OUTPUT_N=$$n; \
|
|
|
|
$(SHELL) $(srcdir)/dg-extract-results.sh \
|
|
|
|
`find racy_outputs/$$n -name gdb.sum -print` > \
|
|
|
|
racy_outputs/$$n/gdb.sum; \
|
|
|
|
$(SHELL) $(srcdir)/dg-extract-results.sh -L \
|
|
|
|
`find racy_outputs/$$n -name gdb.log -print` > \
|
|
|
|
racy_outputs/$$n/gdb.log; \
|
|
|
|
sed -n '/=== gdb Summary ===/,$$ p' racy_outputs/$$n/gdb.sum; \
|
|
|
|
done; \
|
|
|
|
$(srcdir)/analyze-racy-logs.py \
|
|
|
|
`ls racy_outputs/*/gdb.sum` > racy.sum; \
|
|
|
|
sed -n '/=== gdb Summary ===/,$$ p' racy.sum
|
|
|
|
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
# Turn a list of .exp files into "check/" targets. Only examine .exp
|
|
|
|
# files appearing in a gdb.* directory -- we don't want to pick up
|
|
|
|
# lib/ by mistake. For example, gdb.linespec/linespec.exp becomes
|
|
|
|
# check/gdb.linespec/linespec.exp. The list is generally sorted
|
|
|
|
# alphabetically, but we take a few tests known to be slow and push
|
|
|
|
# them to the front of the list to try to lessen the overall time
|
|
|
|
# taken by the test suite -- if one of these tests happens to be run
|
|
|
|
# late, it will cause the overall time to increase.
|
2014-02-19 00:11:02 +00:00
|
|
|
@GMAKE_TRUE@ifeq ($(strip $(TESTS)),)
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
slow_tests = gdb.base/break-interp.exp gdb.base/interp.exp \
|
|
|
|
gdb.base/multi-forks.exp
|
|
|
|
@GMAKE_TRUE@all_tests := $(shell cd $(srcdir) && find gdb.* -name '*.exp' -print)
|
|
|
|
@GMAKE_TRUE@reordered_tests := $(slow_tests) $(filter-out $(slow_tests),$(all_tests))
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(reordered_tests))
|
2014-02-19 00:11:02 +00:00
|
|
|
@GMAKE_TRUE@else
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(expanded_tests_or_none))
|
2014-02-19 00:11:02 +00:00
|
|
|
@GMAKE_TRUE@endif
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
|
|
|
|
do-check-parallel: $(TEST_TARGETS)
|
|
|
|
@:
|
|
|
|
|
|
|
|
@GMAKE_TRUE@check/%.exp:
|
|
|
|
@GMAKE_TRUE@ -mkdir -p outputs/$*
|
2016-01-19 16:06:11 +00:00
|
|
|
@GMAKE_TRUE@ @$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
|
1999-04-16 01:35:26 +00:00
|
|
|
|
Improve analysis of racy testcases
This is an initial attempt to introduce some mechanisms to identify
racy testcases present in our testsuite. As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot
to pollute the gdb-testers mailing list with hundreds of
false-positives messages every month. Hopefully, identifying these
racy tests in advance (and automatically) will contribute to the
reduction of noise traffic to gdb-testers, maybe to the point where we
will be able to send the failure messages directly to the authors of
the commits.
I spent some time trying to decide the best way to tackle this
problem, and decided that there is no silver bullet. Racy tests are
tricky and it is difficult to catch them, so the best solution I could
find (for now?) is to run our testsuite a number of times in a row,
and then compare the results (i.e., the gdb.sum files generated during
each run). The more times you run the tests, the more racy tests you
are likely to detect (at the expense of waiting longer and longer).
You can also run the tests in parallel, which makes things faster (and
contribute to catching more racy tests, because your machine will have
less resources for each test and some of them are likely to fail when
this happens). I did some tests in my machine (8-core i7, 16GB RAM),
and running the whole GDB testsuite 5 times using -j6 took 23 minutes.
Not bad.
In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable. You will assign a number to this
variable, which represents the number of times you want to run the
tests. So, for example, if you want to run the whole testsuite 3
times in parallel (using 2 cores), you will do:
make check RACY_ITER=3 -j2
It is also possible to use the TESTS variable and specify which tests
you want to run:
make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
And so on. The output files will be put at the directory
gdb/testsuite/racy_outputs/.
After make invokes the necessary rules to run the tests, it finally
runs a Python script that will analyze the resulting gdb.sum files.
This Python script will read each file, and construct a series of sets
based on the results of the tests (one set for FAIL's, one for
PASS'es, one for KFAIL's, etc.). It will then do some set operations
and come up with a list of unique, sorted testcases that are racy.
The algorithm behind this is:
for state in PASS, FAIL, XFAIL, XPASS...; do
if a test's state in every sumfile is $state; then
it is not racy
else
it is racy
(The algorithm is actually a bit more complex than that, because it
takes into account other things in order to decide whether the test
should be ignored or not).
IOW, a test must have the same state in every sumfile.
After processing everything, the script prints the racy tests it could
identify on stdout. I am redirecting this to a file named racy.sum.
Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite. I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:
gdb.base/xyz.exp: non-unique message
gdb.base/xyz.exp: non-unique message <<2>>
This means that you will have to be careful about them when you use
the racy.sum file.
I ran the script several times here, and it did a good job catching
some well-known racy tests. Overall, I am satisfied with this
approach and I think it will be helpful to have it upstream'ed. I
also intend to extend our BuildBot and create new, specialized
builders that will be responsible for detecting the racy tests every X
number of days.
2016-03-05 Sergio Durigan Junior <sergiodj@redhat.com>
* Makefile.in (DEFAULT_RACY_ITER): New variable.
(CHECK_TARGET_TMP): Likewise.
(check-single-racy): New rule.
(check-parallel-racy): Likewise.
(TEST_TARGETS): Adjust rule to account for RACY_ITER.
(do-check-parallel-racy): New rule.
(check-racy/%.exp): Likewise.
* README (Racy testcases): New section.
* analyze-racy-logs.py: New file.
2016-03-06 01:37:11 +00:00
|
|
|
do-check-parallel-racy: $(TEST_TARGETS)
|
|
|
|
@:
|
|
|
|
|
|
|
|
@GMAKE_TRUE@check-racy/%.exp:
|
|
|
|
@GMAKE_TRUE@ -mkdir -p racy_outputs/$(RACY_OUTPUT_N)/$*
|
|
|
|
@GMAKE_TRUE@ $(DO_RUNTEST) GDB_PARALLEL=yes \
|
|
|
|
@GMAKE_TRUE@ --outdir=racy_outputs/$(RACY_OUTPUT_N)/$* $*.exp \
|
|
|
|
@GMAKE_TRUE@ $(RUNTESTFLAGS)
|
|
|
|
|
2014-02-19 00:11:02 +00:00
|
|
|
check/no-matching-tests-found:
|
|
|
|
@echo ""
|
|
|
|
@echo "No matching tests found."
|
|
|
|
@echo ""
|
|
|
|
|
2015-08-03 16:17:40 +00:00
|
|
|
# Utility rule invoked by step 2 of the build-perf rule.
|
|
|
|
@GMAKE_TRUE@workers/%.worker:
|
|
|
|
@GMAKE_TRUE@ mkdir -p gdb.perf/outputs/$*
|
2016-01-19 16:06:11 +00:00
|
|
|
@GMAKE_TRUE@ $(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces
|
2015-08-03 16:17:40 +00:00
|
|
|
|
|
|
|
# Utility rule to build tests that support it in parallel.
|
|
|
|
# The build is broken into 3 steps distinguished by GDB_PERFTEST_SUBMODE:
|
|
|
|
# gen-workers, build-pieces, final.
|
|
|
|
#
|
|
|
|
# GDB_PERFTEST_MODE appears *after* RUNTESTFLAGS here because we don't want
|
|
|
|
# anything in RUNTESTFLAGS to override it.
|
|
|
|
#
|
|
|
|
# We don't delete the outputs directory here as these programs can take
|
|
|
|
# awhile to build, and perftest.exp has support for deciding whether to
|
|
|
|
# recompile them. If you want to remove these directories, make clean.
|
|
|
|
#
|
|
|
|
# The point of step 1 is to construct the set of worker tasks for step 2.
|
|
|
|
# All of the information needed by build-piece.exp is contained in the name
|
|
|
|
# of the generated .worker file.
|
|
|
|
@GMAKE_TRUE@build-perf: $(abs_builddir)/site.exp
|
|
|
|
@GMAKE_TRUE@ rm -rf gdb.perf/workers
|
|
|
|
@GMAKE_TRUE@ mkdir -p gdb.perf/workers
|
|
|
|
@GMAKE_TRUE@ @: Step 1: Generate the build .worker files.
|
2016-01-19 16:06:11 +00:00
|
|
|
@GMAKE_TRUE@ $(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers
|
2015-08-03 16:17:40 +00:00
|
|
|
@GMAKE_TRUE@ @: Step 2: Compile the pieces. Here is the build parallelism.
|
|
|
|
@GMAKE_TRUE@ $(MAKE) $$(cd gdb.perf && echo workers/*/*.worker)
|
|
|
|
@GMAKE_TRUE@ @: Step 3: Do the final link.
|
2016-01-19 16:06:11 +00:00
|
|
|
@GMAKE_TRUE@ $(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final
|
2015-08-03 16:17:40 +00:00
|
|
|
|
|
|
|
# The default is to both compile and run the tests.
|
|
|
|
GDB_PERFTEST_MODE = both
|
|
|
|
|
2013-09-25 04:41:45 +00:00
|
|
|
check-perf: all $(abs_builddir)/site.exp
|
|
|
|
@if test ! -d gdb.perf; then mkdir gdb.perf; fi
|
2015-08-03 16:17:40 +00:00
|
|
|
$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS)
|
2013-09-25 04:41:45 +00:00
|
|
|
|
1999-04-16 01:35:26 +00:00
|
|
|
force:;
|
|
|
|
|
|
|
|
clean mostlyclean:
|
2011-02-22 20:52:49 +00:00
|
|
|
-rm -f *~ core *.o a.out xgdb *.x *.grt bigcore.corefile .gdb_history
|
2015-01-13 14:59:32 +00:00
|
|
|
-rm -f core.* *.tf *.cl tracecommandsscript copy1.txt zzz-gdbscript
|
2012-05-17 19:03:59 +00:00
|
|
|
-rm -f *.dwo *.dwp
|
switch to fully parallel mode
This switches "make check" to fully parallel mode.
One primary issue facing full parallelization is the overhead of
"runtest". On my machine, if I "touch gdb.base/empty.exp", making a
new file, and then "time runtest.exp", it takes 0.08 seconds.
Multiply this by the 1008 (in my configuration) tests and you get ~80
seconds. This is the overhead that would theoretically be present if
all tests were run in parallel.
However, the problem isn't nearly as bad as this, for two reasons.
First, you must divide by the number of jobs, assuming perfect
parallelization -- reasonably true for small -j numbers, based on the
results I see.
Second, the current test suite parallelization approach bundles the
tests, largely by directory, but also splitting up gdb.base into two
halves.
I was curious to see how the current bundling played out in practice,
so I ran "make -j1 check RUNTEST='/bin/time runtest'". This invokes
the parallel mode (thus the bundling) and then shows the time taken by
each invocation of runtest.
Then, I ran "/bin/time make -j3 check". (See below about -j2.)
The time for the entire -j3 test run was the same as the time for
"gdb.base1". What this means is that gdb.base1 is currently the
time-limiting run, preventing further parallelization gains.
So, I reason, whatever overhead we see from full parallelization will
only be seen by "-j1" and "-j2".
I then tried a -j2 test run. This does take longer than a -j3 build,
meaning that the gdb.base1 job finishes and then proceeds to other
runtest invocations.
Finally I tried a -j2 test run with the appended patch.
This was 9% slower than the -j2 run without the patch.
I think that is a reasonable slowdown for what is probably a rare
case. I believe this patch will yield faster test results for all -j
values greater than 2. For -j3 on my machine, the test suite is a few
seconds faster; I didn't try any larger -j values.
For -j1, I went ahead and changed the Makefile so that, if no -j
option is given, then the "check-single" mode is used. You can still
use "make -j1 check" to get single-job parallel-mode, though of course
there's no good reason to do so.
This change is likely to speed up the plain "make check" scenario a
little as we will now bypass dg-extract-results.sh.
One drawback of this change is that "make -jN check" is now much more
verbose. I generally only look at the .sum and .log files, but
perhaps this will bother some.
Another interesting question is scalability of the result. The
slowest test, which limits the scalability, took 80.78 seconds. The
mean of the remaining tests is 1.08 seconds. (Note that this is just
a rough estimate, since there are still outliers.)
This means we can run 80.78 / 1.08 =~ 74 tests in the time available.
And, in this data set (slightly older than the above, but materially
the same) there were 948 tests. So, I think the current test suite
should scale ok up to about -j12.
We could improve this number if need be by breaking up the biggest
tests.
2013-11-04 Tom Tromey <tromey@redhat.com>
* Makefile.in (TEST_DIRS): Remove.
(TEST_TARGETS, check-parallel): Rewrite.
(check-gdb.%, BASE1_FILES, BASE2_FILES, check-gdb.base%)
(subdir_do, subdirs): Remove.
(do-check-parallel, check/%): New targets.
(clean): Remove outputs, temp, and cache directories.
(saw_dash_j): New variable.
(CHECK_TARGET): Use it.
(check): Depend on all, site.exp. Rewrite.
(check-single): Remove dependencies.
(slow_tests, all_tests, reordered_tests): New variables.
2013-08-27 17:52:25 +00:00
|
|
|
-rm -rf outputs temp cache
|
2015-08-03 16:17:40 +00:00
|
|
|
-rm -rf gdb.perf/workers gdb.perf/outputs gdb.perf/temp gdb.perf/cache
|
2014-08-20 17:55:54 +00:00
|
|
|
-rm -f read1.so expect-read1
|
1999-04-16 01:35:26 +00:00
|
|
|
|
|
|
|
distclean maintainer-clean realclean: clean
|
|
|
|
-rm -f *~ core
|
|
|
|
-rm -f Makefile config.status *-init.exp
|
|
|
|
-rm -fr *.log summary detail *.plog *.sum *.psum site.*
|
|
|
|
|
2007-11-17 00:54:18 +00:00
|
|
|
Makefile : Makefile.in config.status $(host_makefile_frag)
|
1999-04-16 01:35:26 +00:00
|
|
|
$(SHELL) config.status
|
|
|
|
|
|
|
|
config.status: configure
|
|
|
|
$(SHELL) config.status --recheck
|
2011-03-07 17:03:51 +00:00
|
|
|
|
|
|
|
TAGS: force
|
2011-03-07 22:02:45 +00:00
|
|
|
find $(srcdir) -name '*.exp' -print | \
|
|
|
|
etags --regex='/proc[ \t]+\([^ \t]+\)/\1/' -
|
2014-08-20 17:55:54 +00:00
|
|
|
|
|
|
|
# Build the expect wrapper script that preloads the read1.so library.
|
|
|
|
expect-read1:
|
|
|
|
@echo Making expect-read1
|
|
|
|
@rm -f expect-read1-tmp
|
|
|
|
@touch expect-read1-tmp
|
|
|
|
@echo "# THIS FILE IS GENERATED -*- buffer-read-only: t -*- \n" >>expect-read1-tmp
|
|
|
|
@echo "# vi:set ro: */\n\n" >>expect-read1-tmp
|
|
|
|
@echo "# To regenerate this file, run:\n" >>expect-read1-tmp
|
|
|
|
@echo "# make clean; make/\n" >>expect-read1-tmp
|
|
|
|
@echo "export LD_PRELOAD=`pwd`/read1.so" >>expect-read1-tmp
|
|
|
|
@echo 'exec expect "$$@"' >>expect-read1-tmp
|
|
|
|
@chmod +x expect-read1-tmp
|
|
|
|
@mv expect-read1-tmp expect-read1
|
|
|
|
|
|
|
|
# Build the read1.so preload library. This overrides the `read'
|
|
|
|
# function, making it read one byte at a time. Running the testsuite
|
|
|
|
# with this catches racy tests.
|
|
|
|
read1.so: lib/read1.c
|
|
|
|
$(CC) -o $@ ${srcdir}/lib/read1.c -Wall -g -shared -fPIC $(CFLAGS)
|
|
|
|
|
|
|
|
# Build the read1 machinery.
|
|
|
|
.PHONY: read1
|
|
|
|
read1: read1.so expect-read1
|