# Python Library ## Usage ```python import flatterer output = flatterer.flatten('games.json', 'games_dir') ``` If the first argument to `flatten` is a string (as in the above example), it represents the path to the input JSON file. However, it is also possible to supply an iterator of python dicts: i.e ```python import flatterer list_of_objects = [{"a", "a"}, {"a", "b"}] output = flatterer.flatten(list_of_objects, 'games_dir') ``` For more complicated cases, where you want a stream of data, it is possible make a generator to feed data into flatterer. This example uses `ijson` to stream all objects from a JSON array in a file: ``` import flatterer def array_item_generator(): with open('fixtures/basic.json', 'rb') as f: for item in ijson.items(f, 'item'): // you could manipulate items to modify data as it is going in. yield item output = flatterer.flatten(array_item_generator(), 'games_dir') ``` This can be useful if you need to modify the data before it is processed, or if you are streaming data from a non-file source such as a database. It is also possible to supply an iterator of bytes or strings that will be interpreted as JSON. Also you can add the `files=True` argument to supply a list of file names. ``` import flatterer output = flatterer.flatten(['games.json','games2.json'], 'games_dir', files=True) ``` All other options are the same as the command line tool described in [](options.md#option-reference). ### Output The output from running `flatten` is a dict which contains metadata about the conversion: ``` >>> output = flatterer.flatten('games.json', 'games_dir', sqlite=True, xlsx=True) >>> print(output) { "fields": "DataFrame - table_name, field_name, field_type, fiel... (5 fields)" "tables": "DataFrame - table_name, table_title" "data": { "main": "File Path - games_dir/csv/main.csv" "developer": "File Path - games_dir/csv/developer.csv" "platforms": "File Path - games_dir/csv/platforms.csv" } "sqlite": "File Path - games_dir/sqlite.db" "xlsx": "File Path - games_dir/output.xlsx" } ``` So output['fields'] contains a pandas DataFrame with information about the fields generated by flatterer: ``` >>> print(output['fields']) table_name field_name field_type field_title count 0 main _link text _link 2 1 main id number id 2 2 main title text title 2 3 main releaseDate date releaseDate 2 4 main rating_code text rating_code 2 5 main rating_name text rating_name 2 6 developer _link text _link 2 7 developer _link_main text _link_main 2 8 developer name text name 2 9 platforms _link text _link 3 10 platforms _link_main text _link_main 3 11 platforms name text name 3 ``` Similar for `output['tables']` showing the tables in the data. ``` >>> print(output['tables']) table_name table_title 0 main main 1 developer developer 2 platforms platforms ``` - `output['data']` contains the locations of the CSV files if `csv=False` is not set. - `output['sqlite']` contains the location of the sqlite database if `sqlite=True` is set. - `output['xlsx']` contains the locatin of the xlsx file if `xlsx=True` is set. ### Creating pandas DataFrames You can do this by setting `dataframe=True` This creates DataFrames for all tables generated. This option is for the python library only. **Warning: This will cause issues with large datasets as the DataFrames will be put in memory** ``` >>> output = flatterer.flatten('games.json', dataframe=True) >>> print(output) { "fields": "DataFrame - table_name, field_name, field_type, fiel... (5 fields)" "tables": "DataFrame - table_name, table_title" "data": { "main": "DataFrame - _link, id, title, releaseDate, rating_co... (6 fields)" "developer": "DataFrame - _link, _link_main, name" "platforms": "DataFrame - _link, _link_main, name" } } ``` As you can see with this option you do not need to supply an output directory and will work in your systems temporary space. The `data` key in the output now contains dataframes, e.g ``` >>> print(output['data']['main']) _link id title releaseDate rating_code rating_name 0 0 1 A Game 2015-01-01 E Everyone 1 1 2 B Game 2016-01-01 E Everyone ```