# Python Library

## Usage 

```python
import flatterer
output = flatterer.flatten('games.json', 'games_dir')
```

If the first argument to `flatten` is a string (as in the above example), it represents the path to the input JSON file.  However, it is also possible to supply an iterator of python dicts:

i.e
```python
import flatterer
list_of_objects = [{"a", "a"}, {"a", "b"}]
output = flatterer.flatten(list_of_objects, 'games_dir')
```

For more complicated cases, where you want a stream of data, it is possible make a generator to feed data into flatterer.  This example uses `ijson` to stream all objects from a JSON array in a file:

```
import flatterer

def array_item_generator():
    with open('fixtures/basic.json', 'rb') as f:
        for item in ijson.items(f, 'item'):
            // you could manipulate items to modify data as it is going in.
            yield item

output = flatterer.flatten(array_item_generator(), 'games_dir')
```

This can be useful if you need to modify the data before it is processed, or if you are streaming data from a non-file source such as a database.

It is also possible to supply an iterator of bytes or strings that will be interpreted as JSON.

Also you can add the `files=True` argument to supply a list of file names.

```
import flatterer
output = flatterer.flatten(['games.json','games2.json'], 'games_dir', files=True)
```


All other options are the same as the command line tool described in [](options.md#option-reference).

### Output

The output from running `flatten` is a dict which contains metadata about the conversion:

```
>>> output = flatterer.flatten('games.json', 'games_dir', sqlite=True, xlsx=True)
>>> print(output)
{
    "fields": "DataFrame - table_name, field_name, field_type, fiel... (5 fields)"
    "tables": "DataFrame - table_name, table_title"
    "data": {
        "main": "File Path - games_dir/csv/main.csv"
        "developer": "File Path - games_dir/csv/developer.csv"
        "platforms": "File Path - games_dir/csv/platforms.csv"
    }
    "sqlite": "File Path - games_dir/sqlite.db"
    "xlsx": "File Path - games_dir/output.xlsx"
}
```

So output['fields'] contains a pandas DataFrame with information about the fields generated by flatterer:

```
>>> print(output['fields'])
   table_name   field_name field_type  field_title  count
0        main        _link       text        _link      2
1        main           id     number           id      2
2        main        title       text        title      2
3        main  releaseDate       date  releaseDate      2
4        main  rating_code       text  rating_code      2
5        main  rating_name       text  rating_name      2
6   developer        _link       text        _link      2
7   developer   _link_main       text   _link_main      2
8   developer         name       text         name      2
9   platforms        _link       text        _link      3
10  platforms   _link_main       text   _link_main      3
11  platforms         name       text         name      3
```

Similar for `output['tables']` showing the tables in the data.

```
>>> print(output['tables'])
  table_name table_title
0       main        main
1  developer   developer
2  platforms   platforms
```

- `output['data']` contains the locations of the CSV files if `csv=False` is not set.
- `output['sqlite']` contains the location of the sqlite database if `sqlite=True` is set.
- `output['xlsx']` contains the locatin of the xlsx file if `xlsx=True` is set.


### Creating pandas DataFrames

You can do this by setting `dataframe=True` 

This creates DataFrames for all tables generated. This option is for the python library only.

**Warning: This will cause issues with large datasets as the DataFrames will be put in memory**

```
>>> output = flatterer.flatten('games.json', dataframe=True)
>>> print(output)
{
    "fields": "DataFrame - table_name, field_name, field_type, fiel... (5 fields)"
    "tables": "DataFrame - table_name, table_title"
    "data": {
        "main": "DataFrame - _link, id, title, releaseDate, rating_co... (6 fields)"
        "developer": "DataFrame - _link, _link_main, name"
        "platforms": "DataFrame - _link, _link_main, name"
    }
}
```
As you can see with this option you do not need to supply an output directory and will work in your systems temporary space.

The `data` key in the output now contains dataframes, e.g

```
>>> print(output['data']['main'])
   _link  id   title releaseDate rating_code rating_name
0      0   1  A Game  2015-01-01           E    Everyone
1      1   2  B Game  2016-01-01           E    Everyone
```