Skip to main content
blog title image

4 minute read - Technical Testing

How to write a simple random test data sentence generator

Sep 21, 2016

TLDR; I wrote a random test data generator, not a slogan generator

On the Importance of Test Data

We all know that test data is really really important.

If you don’t have the right data, you can’t test the functionality. You can’t check that an email is sent out on the customer’s 65th Birthday unless you have a customer who has a date of birth that will trigger that functionality.

Some data is path invariant and has to be specific to control the path.

We know this.

But we don’t always randomise enough data and our test data becomes stale and etc. etc.

One of my hobbies - randomly generating test data

Periodically I write code to randomly generate data. Its easier than writing a full compiler and interpreter but still keeps my hand in at parsing text. You can find old notes and tools on test data.

My most recent public test data utility attempts to randomly recreate some of the cartoon ‘slogans’ from my book “Dear Evil Tester”:

  • “Of course I’m not Evil… do I look Evil?
  • “Are you a good little tester? I’m better than that, I’m Eeevil!”
  • “I’m not evil, I’m just doing WHATEVER it takes”
  • etc.

You get the idea, if you want more you can read the book or try out my Sloganizer online.

My Sloganizer is a Test Data Generator

Since I’m still learning JavaScript I created a really simple string generator, if you read the code in the sloganizer.html then you’ll see it, but I’m going to explain the algorithm here in case you want to use it in your own test data generation work.

I have an array of strings which are sentence templates e.g.

  • “#start I’m not #im_not”
  • “#start I’m #im_not”

Everything starting with a “#” is a ‘macro’, everything else is a string literal.
The ‘macros’ are a hash of:

  • ‘key’ - which matches the macro name e.g. “start” and “im_not”
  • ‘value’ - which is an array of strings, where the string might be another macro or a literal

e.g.

"start" : ["", "Of course", "I honestly believe", "I really do think"],
"im_not" : ["evil", "good", "nasty", "unpleasant"],

And I have a recursive function which, given a string will:

  • work through the string
  • if it finds a ‘macro’ name then it randomly chooses a string from the macro array and expands it
  • if it finds a literal then it adds it to the output string

Pretty simple.

So “#start I’m not #im_not” might generate:

  • I’m not good
  • Of course I’m not nasty
  • I really do think I’m not evil

The code isn’t particularly forgiving when given bad data:

  • I could get in an infinite loop if a macro string references itself
  • if a macro doesn’t have an entry in the hash then the code will throw an exception

The code doesn’t ‘compile’ the sentences or phrases to find these problems in advance (although it could, if I wrote code to do that).

But it does work, and it will generate thousands, if not millions of random sentences.

What’s the point?

The point is, that:

  • It doesn’t take much to create random data.
  • It doesn’t take a long time to write utility functions to generate random data.
  • Even if you can’t find a library that you like, for the language you use. You could write your own, or probably re-purpose a template engine to create data.

And, more dangerously… its fun to write random data generation code.

Note:

“The Evil Tester Sloganizer” is one of the examples used in the “Testing JavaScript from the Browser Dev Tools Console” section of my “Technical Web Testing 101” course. This section has over an hour of video explaining how to test JavaScript applications from the browser console. When you learn to do this you open up a whole new way of interacting with, and automating, JavaScript applications. Check out the course and you can learn about dev tools, proxies, REST testing, mobile web testing and more.