Client: Internal Project
Project Duration: 8 months (ongoing)
Goal: Being able to generate data that simulate production-like values and distribution in order to use it in unit, integration, system, and performance testing, as well as for training machine learning models and algorithms.
Tech: Java SE
Having data you can reason about is always better than having totally random data. That is a fact. It can help you spot some edge case errors within system, give you more confidence in system correctness. Only problem with that kind of data is that it takes time to be created, it is manual process and can introduce errors. Use cases for library that can generate this kind of data are many.
- Client does not have a data set that can test infrastructure.
- Client cannot share actual production data, but is able to describe it.
- Creating test data for integration and performance testing is time consuming and usually does not reflect production-like values and distribution.
- We wanted to create data that is as realistic as possible in order reason about the system much easier than with totally random data.
- Making a flexible and easy to use library that will allow us to declaratively generate production-like data.
- Manual imperative test data creation is error-prone.
We decided on having lightweight library that can be integrated easily in any Java project and save you of bothersome writing of test data or building your own test data generation library. Library should be well documented with lot of examples in order for new user to jump in really quickly. Having to learn it for a more than 2-3 days is not an option.
We created two APIs, Java API with a lot of helper methods to ease the use and shorten the syntax and have elegant code. And YAML configuration API which is DSL even more expressive than Java API. Each possible option is thoroughly explained within documentation with examples.
Library is used within few our internal projects, also we have used it within few client projects as it was easier and more time saving than to create specific implementation within the project. We also integrated it to our Berserker project as one of the data sources.