Outputs#

All of CSV, XLSX, PARQUET and SQLITE data can be produced with the following.

flatterer games.json games_dir --xlsx --csv --sqlite --parquet

This will create the directory structure:

games_dir/
├── csv
│   ├── games.csv
│   └── platforms.csv
├── parquet
│   ├── games.parquet
│   └── platforms.parquet
├── datapackage.json
├── fields.csv
├── tables.csv
├── sqlite.db
├── output.xlsx
├── postgresql
│   ├── postgresql_load.sql
│   └── postgresql_schema.sql
└── sqlite
    ├── sqlite_load.sql
    └── sqlite_schema.sql

CSV#

The csv directory contains a CSV file representing a relational table. A new CSV file is produced for each one-to-many relationship found in the input JSON.

XLSX#

The XLSX output can be found in the output.xlsx file. It contains the same data as the CSV files with a sheet per table. Sheet names are truncated to not go over the allowed limit.

SQLITE#

A sqlite.db is created in the output directory where each csv file is put in its own table.

POSTGRES#

Data is created in the postgres database where each csv file is put in its own table.

PARQUET#

The parquet directory contains a parquet file representing a relational table.

fields.csv#

fields.csv contains some metadata about the output tables:

table_name	field_name	field_type	count
platforms	_link	text	3
platforms	_link_games	text	3
platforms	name	text	3
games	_link	text	2
games	_link_games	text	2
games	id	number	2
games	rating_code	text	2
games	rating_name	text	2
games	releaseDate	date	2
games	title	text	2

table_name: The name of the table, which will be the same as the CSV file name without the .csv extension, or the XLSX sheet name.
field_name: The name of field in the table. This will be the same as the heading line in the CSV file and XLSX sheet.
field_type: Type guess of the type of data within the field.
count: Amount of times that field appears int the JSON.

tables.csv#

tables.csv contains some metadata about the output tables:

table_name	table_title
platforms	platforms
games	platforms

table_name: The name of the table, which will be the same as the CSV file name without the .csv extension, or the XLSX sheet name.
table_title: This will be the same as table_name unless tables.csv input is used.

Postgresql Files#

Need option sql_scripts=True. When using a CSV output a postgresql directory is made which contains SQL to help load the CSV files into the database.

postgresql_schema.sql#

Contains SQL with the basic schema definitions for the tables created. The file looks like:

CREATE TABLE "games"(
    "_link" TEXT,
    "_link_games" TEXT,
    "id" NUMERIC,
    "title" TEXT,
    "releasedate" TIMESTAMP,
    "rating_code" TEXT,
    "rating_name" TEXT);

CREATE TABLE "platforms"(
    "_link" TEXT,
    "_link_games" TEXT,
    "name" TEXT);

Using the psql command line tool the schema can be loaded with something like:

psql postgresql://user:password@host/database -f games_dir/postgresql/postgresql_schema.sql

postgresql_load.sql#

This script imports the CSV files into the tables created by postgresql_schema.csv. This script requires the psql command line tool as it uses its \copy command but can easily be adapted to use the plain COPY command by removing the \. The advantage of using \copy is that it does not have to be run on the server where the database is. It looks like:

\copy "games" from 'csv/games.csv' with CSV HEADER
\copy "platforms" from 'csv/platforms.csv' with CSV HEADER

and can be run by:

psql postgresql://user:password@host/database -f games_dir_/postgresql/postgresql_load.sql

Sqlite Files.#

Need option sql_scripts=True. When using a CSV output a sqlite directory is made which contains SQL to help load the CSV files into a sqlite database.

sqlite_schema.sql#

Contains SQL with the basic schema definitions for the tables created. The file is the same as the postgresql schema file.

Using the sqlite3 command line tool the schema can be loaded with something like:

cat games_dir/sqlite/sqlite_schema.sql | sqlite3 my_database.db

sqlite_load.sql#

This script imports the CSV files into the tables created by sqlite_schema.csv. This script requires the sqlite3 command line tool as it uses its .import command. The contents looks like.

.mode csv 
.import 'csv/games.csv' games --skip 1 
.import 'csv/platforms.csv' platforms --skip 1 

and can be run by:

cat games_dir/sqlite/sqlite_load.sql | sqlite3 my_database.db

datapackage.json#

Contains metadata in the Tabular Datapackge Spec. I looks like:

{
  "profile": "tabular-data-package",
  "resources": [
    {
      "profile": "tabular-data-resource",
      "name": "games",
      "schema": {
        "fields": [
          {
            "name": "_link",
            "type": "text",
            "count": 2
          },
          {
            "name": "_link_games",
            "type": "text",
            "count": 2
          },
          {
            "name": "id",
            "type": "number",
            "count": 2
          },
          {
            "name": "title",
            "type": "text",
            "count": 2
          },
          {
            "name": "releaseDate",
            "type": "date",
            "count": 2
          },
          {
            "name": "rating_code",
            "type": "text",
            "count": 2
          },
          {
            "name": "rating_name",
            "type": "text",
            "count": 2
          }
        ],
        "primaryKey": "_link"
      },
      "path": "csv/games.csv"
    },
    {
      "profile": "tabular-data-resource",
      "name": "platforms",
      "schema": {
        "fields": [
          {
            "name": "_link",
            "type": "text",
            "count": 3
          },
          {
            "name": "_link_games",
            "type": "text",
            "count": 3
          },
          {
            "name": "name",
            "type": "text",
            "count": 3
          }
        ],
        "primaryKey": "_link"
      },
      "path": "csv/platforms.csv"
    }
  ]
}