-
Data is an important corporate asset.
Doesn’t it make sense to invest the effort required to validate the quality
of data via effective testing? My July 2006
survey into the current state of data management indicates that 95.7% of
respondents believe that data is a corporate asset. Yet of them only
40.3% had a database test suite in place to validate the data and of those
without a test suite only 31.6% had even discussed the concept.
-
Mission-critical business functionality is implemented in
RDBMSs. In the
survey, 63.7% of respondents indicated that their organizations
did this, but of those only 46% had regression tests in place to validate
the logic. Shouldn’t we be doing better?
-
Current approaches aren’t sufficient. The current state of
the art in many organizations is for data professionals to control changes to
the database schemas, for developers to visually inspect the database
during construction, and to perform some form of formal testing
during the test phase at the end of the lifecycle. Unfortunately,
none of these approaches prove effective. Application developers will
often go around their organization’s data management group because they find
them too difficult to work with, too slow in the way they work, or sometimes
they don’t even know they should be working together. The end result
is that the teams don’t follow the desired data quality procedures and as a
result quality suffers. Although visual inspection
of query results is a good start it is little more than a debugging technique in
practice that will help you to find problems but not prevent them. Testing late in the lifecycle is better than nothing, but as Barry
Boehm noted in the early 80s it’s
incredibly
expensive to fix any defects you find at that point.
-
Testing provides the concrete feedback required to
identify defects. How do you know how good the quality of your
source data actually is without an effective test suite which you can run
whenever you need to?
-
Support for evolutionary development. Many
evolutionary development techniques, in particular
database
refactoring, are predicated upon the idea that it must be possible to
determine if something in the database has been broken when a change has
been made. The easiest way to do that is to simply run your regression
test suite.
An important part of writing database tests is the creation
of test data. You have several
strategies for doing so:
-
Have source test data.
You can maintain an external definition of the test data, perhaps in flat
files, XML files, or a secondary set of tables. This data would be
loaded in from the external source as needed.
-
Test data creation scripts.
You develop and maintain scripts, perhaps using data manipulation language (DML)
SQL code or simply application source code (e.g. Java or C#), which does the
necessary deletions, insertions, and/or updates required to create the test
data.
-
Self-contained test cases.
Each individual test case puts the database into a known state required for
the test.
These approaches to creating test data can be used alone or
in combination. A significant
advantage of writing creation scripts and self-contained test cases is that it
is much more likely that the developers of that code will place it under configuration
management (CM) control. Although
it is possible to put test data itself under CM control, worst case you generate
an export file that you check in, this isn’t a common practice and therefore
may not occur as frequently as required. Choose
an approach that reflects the culture of your organization.
Where does test data come from? For unit testing, I prefer to
create sample data with known values. This way I can predict the
actual results for the tests that I do write and I know I have the
appropriate data values for those tests. For other forms of
testing — particularly load/stress, system integration, and function
testing, I will use live data so as to better simulate real-world
conditions.
| |
| Beware Coupling:
One danger with database regression testing, and
with regression testing in general, is coupling between tests.
If you put the database into a known state, then run several
tests against that known state before resetting it, then those
tests are potentially coupled to one another. Coupling
between tests occurs when one test counts on another one to
successfully run so as to put the database into a known state
for it. Self-contained test cases do not suffer from this
problem, although may be potentially slower as a result due to
the need for additional initialization steps.
|
|
4.4 What Testing Tools Are Available?
I believe that there are several critical features which you need to
successfully test RDBMSs. First, as Figure 1
implies you need two categories of database testing tools, one for interface tests
and one for internal database tests. Second, these testing tools should support the language that you’re developing
in. For example, for internal database testing if you’re a Microsoft SQL Server developer, your T-SQL
procedures should likely be tested using some form of T-SQL framework.
Similarly, Oracle DBAs should have a PL-SQL-based unit testing framework.
Third, you need tools which help you to put your database into a known state,
which implies the need not only for test data generation but also for managing
that data (like other critical development assets, test data should be under
configuration management control).
To make a long story short, although we’re starting to see a glimmer of hope
when it comes to database testing tools, as you can see in
Table 2, but we still have a long way to go. Luckily there are some good tools being developed by the open source software
(OSS) community and there are some commercial tools available as well.
Having said that, IMHO there is still significant opportunity for tool
vendors to improve their database testing offerings.
-
Use an in-memory database for regression testing.
You can dramatically speed up your database tests by running them, or at
least portions of them, against an in-memory database such as
HSQLDB. The challenge with this
approach is that because database methods are implemented differently across
database vendors that any method tests will still need to run against the
actual database server.
-
Start fresh each major test run. To ensure
a clean database, a common strategy is that at the beginning of each test
run you drop the database, then
rebuild it from scratch taking into account all
database
refactorings and transformations to that point, then reload the test data,
and then run your tests. Of course, you wouldn’t do this to your
production database.
-
Take a continuous approach to regression testing.
I can’t say this enough, a
TDD approach to
development is an incredibly effective way to work.
-
Train people in testing. Many developers
and DBAs have not been trained in testing skills, and they almost certainly
haven’t been trained in database testing skills. Invest in your
people, and give them the
training
and education they need to do their jobs.
-
Pair with novices with people that have database testing experience.
One of the easiest ways to gain database testing skills is to pair program
with someone who already has them.