A more complex example

This is a more complex example of how to use fodder to generate some data. It contains multiple schema files in the schemas/ directory, it utilises the data/ directory as a temporary location for data that is generated by one schema and used by another schema, finally it outputs the data to the tables/ directory.

The above is all controlled by two short bash scripts, initialise and gen. These two scripts utilise the fodder CLI along with some standard Unix tools to generate the data.

initialise

This script is used to initialise the data by generating the categories that will remain static in future runs. In a way this can be thought of as generating our DIM tables in a data warehouse.

#!/usr/bin/env bash
#
# Generate our primary tables

set -eo pipefail

if [ -z "$1" ]; then
    ROWS=5;
else
    ROWS="$1";
fi

echo "GENERATING SELLER DATA"
fodder -s schemas/ID.fodder.yaml -n "$ROWS" -f csv > data/SELLER_ID.csv
fodder -s schemas/COMPANY.fodder.yaml -n "$ROWS" -f csv > data/SELLER_COMPANY.csv
paste -d "," data/SELLER_ID.csv data/SELLER_COMPANY.csv > tables/SELLER_ID_COMPANY.csv
cp data/SELLER_ID.csv tables/SELLER_ID.csv
rm data/SELLER_ID.csv data/SELLER_COMPANY.csv

echo "GENERATING BUYER DATA"
fodder -s schemas/ID.fodder.yaml -n "$ROWS" -f csv > data/BUYER_ID.csv
fodder -s schemas/COMPANY.fodder.yaml -n "$ROWS" -f csv > data/BUYER_COMPANY.csv
paste -d "," data/BUYER_ID.csv data/BUYER_COMPANY.csv > tables/BUYER_ID_COMPANY.csv
cp data/BUYER_ID.csv tables/BUYER_ID.csv
rm data/BUYER_ID.csv data/BUYER_COMPANY.csv

gen

The gen script is used to generate the main data. This is the data that will change each time the script is run. In a way this can be though of as generating our FACT tables in a data warehouse.

In the scenario that we are generating data for a data warehouse, we would run the initialise script once and then run the gen script each time we want to generate new data.

This could potentially be automated by running the gen script as a cron job, or similar (more complicated) mechanism.

#!/usr/bin/env bash
# 
# Generate latest sales data

set -eo pipefail

if [ -z "$1" ]; then
    ROWS=20;
else
    ROWS="$1";
fi

echo "GENERATING SALES"
fodder -s schemas/SALES.fodder.yaml -f csv -n "$ROWS" > tables/SALES.csv