Why fodder?

There are a number of data generation tools and options out there but there wasn't anything that quite fit. So here's the user story:

As a data engineer, I'd like to quickly generate data for my tables or message stream.

Here are some key drivers for fodder...

  • A single binary. Easy to install and update.
  • Scales with hardware so it can utilise bigger and/or more machines.
  • Straight-forward tool focused on just data generation.
  • Compose your specific solution with other tools.
  • Fairly fast and efficient so useful for generating large data sets.
  • No exposing your schemas to third parties.
  • No external service so can run in secured, isolated environments.
  • Get going quickly. Minimal dependencies means minimal impediments

You can use it in your shell as part of your local development workflow. You can use it in your CICD pipelines for testing. Remember to bake in failure modes, not just the happy path ;) You can put it in a container and run it at scale for integration & performance testing.

Other tools

Faker (Javascript, Python, Ruby etc)

These libraries are great but you need to write your custom generator as it's own thing. They're powerful, flexible and full featured but require a lot of time to use & maintain effectively. The expertise needed can also be daunting. We wanted something easier to get going and address the common cases. There is a lot of boilerplate just to get up and running. This doesn't scale so well across projects and the software development expertise and time required is a barrier for many teams/organisations.

Mockaroo, GenRocket etc

These require commercial arrangements. They use a service outside our environment which can be a complete showstopper for some organisations. Using them at scale, especially around performance testing can be problematic/expensive.

JSONPlaceholder, MockServer, Mockbin

These are designed for mocking out an API, not really flexible data generation at scale.