TLDR; Reverse counterstrings are easier to generate. Creating same output forward is harder but might be useful for streaming or files.
I assume everyone has heard of and used CounterStrings. I came across them because James Bach wrote about them and created the perlclip tool to generate them.
*3*5*7*9*12*15*
Over the years I’ve written a few utilities for generating CounterStrings for a variety of platforms. I had to implement them in Excel once because we weren’t allowed to install any test tools. Fortunately, with Excel we had VBA and could write anything we wanted.
I’ll describe the steps I’ve taken to create a Predictive Forward CounterString Algorithm.
Reverse Counterstrings
The simplest, and most elegant way to create a CounterString seems to be the approach used in PerlClip.
- start at the number you want
- convert it into a “*number”
- then generate the next number, and add it to the String
- when finished
- reverse the string
Perl seems like a pretty efficient language, I don’t know if it reverses strings in the same memory buffer or not.
But back in the olden days I used to worry about memory usage a lot. Stems from growing up with 48K.
Very often with CounterStrings we generate large strings. And it is possible to blow out the memory allocated to your process.
I haven’t seen that happen with PerlClip.
I did see that happen with the Excel version I wrote
And I have run out of heap space with the Java version I wrote.
The most obvious way to reverse a String is to:
- create a new string
- copy all the chars from the old string into the new string from back to front
Which would, at a minimum, double the memory usage for the String.
Perhaps there is a way to generate CounterStrings forward instead of reversing?
Forward CounterStrings
A forward counterstring, without predicting the *
boundaries would likely generate something like this for 15
:
2*4*6*8*11*14*1
You can still pretty much tell that it is 15 chars in length, but if we want to use the notion of “Self Describing Test Data” that James Bach set out to achieve with CounterStrings then we didn’t do a particularly good job.
The problem with generating CounterStrings forward is that you need to predict certain things.
- should you start with
*
or with2
?
There are some choices that are pretty easy to predict. e.g.
Does the value I want to hit fit in the remaining spaces? If so then use it. i.e. after 7
we can predict the next string.
*3*5*7*9*
*3*5*7*10*
If we didn’t do this then 10
would have ended up as *3*5*7*9*1
assuming we managed to predict that we needed to start with *
But… it isn’t always that easy. Consider 15 and 16
*3*5*7*9*12*15*
*3*5*7*10*13*16*
The algorithm would have to predict that it needed to start with *
and it needed to make a different decision in the middle of the string.
The Forward CounterString algorithm that I used in Excel, didn’t predict. The results were functional but not aesthetic.
Predictive Forward CounterStrings
I did find a way to predict, but I couldn’t find a way to do it elegantly.
After staring at far too many counterstrings I thought:
- I know what the final number is
- I need to figure out the last number that has the same number of chars
- and then just keep chunking back
e.g. with the 15 example above
*3*5*7*9*12*15*
- I know I want to end in 15.
- If I can calculate the last three char number i.e. 12
- Then the rest is easy
And with 16
*3*5*7*10*13*16*
- I know I want to end in 16.
- If I can calculate the last three char number i.e. 10
- Then the rest is easy
All I really have to predict are the boundaries, then I can fill in the blanks.
And after trial and error, I wrote code to do just that:
private int calculateLowestXDigitNumberForThisRange(int maxNumberWithXDigits, String spacer) {
int displayLengthInStringWithLimiterAppended = getLengthOfNumberInString(maxNumberWithXDigits, spacer);
//*3*5*7*10*13*16*19*22*25*28*31*34*37*40*43*46*49*52*55*58*61*64*
//67*70*73*76*79*82*85*88*91*94*97*101*105*109*113*117*121*125*
// e.g. 125 has 3 chars
// and display length with limiter would be 4 i.e. 125*
int numberHasXChars = String.valueOf(maxNumberWithXDigits).length();
// length - lowest x digit number
// lowest 2 digit number is 10 to the power of (2-1) = 10
// e.g. lowest 3 digit number is 10 to the power of (3-1) = 100
int lowestXDigitNumber = (int) Math.pow(10, (numberHasXChars - 1));
// what is the gap between our highest range number and the lowest in the range?
// e.g. 125 - 100 = 25
int differenceOfThisDigitValues = maxNumberWithXDigits - lowestXDigitNumber;
// how many X digit numbers are in this range? (integer division)
// e.g. 25/4 = 6 how many 4 character numbers fit in 25? == 6 125*
int numberOfThisDigitValues = differenceOfThisDigitValues / displayLengthInStringWithLimiterAppended;
// but we already displayed one
numberOfThisDigitValues++;
// 125 - (4*7) == 97
int lowestXCharNumberIs = maxNumberWithXDigits
- (displayLengthInStringWithLimiterAppended * numberOfThisDigitValues);
if(lowestXCharNumberIs==0){
// if it is 0 then it is really 2 because we can't have 0 as the lowest number
lowestXCharNumberIs=2;
}
// length of 97 with spacer is 3
int lengthOfLowestAsCounterString = getLengthOfNumberInString(lowestXCharNumberIs, spacer);
// 3 != 4 so 97 is not lowest, lowest must be 97 + 4 = 101
if (lengthOfLowestAsCounterString != displayLengthInStringWithLimiterAppended) {
//it is not an X char number so next X char number is lowest + lengthOfNumberInString
lowestXCharNumberIs = lowestXCharNumberIs + displayLengthInStringWithLimiterAppended;
}
return lowestXCharNumberIs;
}
I have refactored to get that code, and removed some code for unnecessary edge cases.
I can probably tidy up the general objects that I’ve used:
But I’m OK with it as it stands.
And you can mess with it in JavaScript if that is more your thing
For the Mathematically Inclined
I have no doubt that there is a more elegant solution for those who are mathematically inclined. Or perhaps, your external review, of my approach and code, will reveal some shortcuts that can help.
I’m happy to learn of better ways to do this.
Why forward generation?
Generating a CounterString using a Predictive Forward method means that it is easier to generate counterstrings for:
- large files
- streaming systems
Potentially, I could save memory and make things faster by allocating a fixed char array and populating it in one pass.
- saving memory
- generating strings faster
And, because… the challenge.
If you have any links to other CounterString implementations then feel free to leave them as comments so we can all learn from others.