Machine learning applications are often hard to test due to their imminent lack of transparency during runtime. Evolutionary algorithms can, however be tested if some assumptions can be made.

For the sake of simplicity, we can assume the following:

  • The application consists of a single public function bool guessBool(text).
  • The input text should allegedly contain some text representation of a bool.
  • We have some samples that can be used to train the underlying models.
  • We have some other samples that we can test the method with.
  • We do not test metrics with training data.
  • The underlying model is somewhat exposed during training and testing.

Now, we need to identify the most important aspects of our function. These aspects will vary greatly from application to application. For reference, I will use the following criteria:

  1. The function should be as accurate as possible in terms of exact matches.
  2. The underlying model should improve over time.

Now, we can see that the goals are connected to each other. In order to test the functionality, we can use the first criterion as a metric to test the second.

After converting the first criteria to a metric, we can test the method through the following pseudo-code:

modelAccuracy(trainingSamples, testingSamples) {
    model = new BoolGuesser(trainingSamples)

    accuracy = 0
    for sample in testingSamples:
        bool guess = model.guessBool(sample.text)
        if guess == sample.realValue:
            accuracy += 1

    return accuracy / trainingSamples.size
}

y = x * 10
accuracyX = modelAccuracy(getTrainingSamples(x), getTestingSamples(x))
accuracyY = modelAccuracy(getTrainingSamples(y), getTestingSamples(y))

if accuracyY < accuracyX:
    fail()
else
    success()

It is important that we do not fail if the accuracy is the same. Thus, a strict comparison is used with the condition that we fail the test if the accuracy is not worse. In our case, we may not be able to guarantee strict improvement.

What if the accuracy is already 100%? Of course, we would be happy. Sadly, the world is not perfect, and 100% accuracy is rarely achieved in machine learning applications.

Now, please prove me wrong with your excellent machine learning algorithms.