csvs to json

CSVs to Single JSON

Please follow and like us:

While json.dump is normally used to produce a JSON file, you may have circumstances where json.dump is not the best option. In this example, I got data from multiple files that I wanted to put into a single JSON file. I wanted to avoid using pandas.

The data comes in files for each year, and it includes about 158 countries per year. I wanted to make a single JSON file that has all of the data.

The CSV data that I am using in this example is from Kaggle and the world happiness report.

I tried using csv.DictReader. CSV.DictReader does not seem to let me add an upper-level list to an eventual JSON string and then file.

I eventually want to make a list of all data for all the countries. It will feature an upper-level list of each year through 2019, but not including 2020.

The following tutorial requires os, csv, and json. All of these are in the standard python library. So, you won’t have to install them in your environment. However, you will still need to import them in your python file

file.

import os
import csv
import json

I renamed the downloaded CSV files as the year.csv. So, “2015.csv”, for example. Make a list of the file names. Place all of the files in the resources folder within that project folder.

fname=["2015.csv", "2016.csv", "2017.csv", "2018.csv", "2019.csv"]

Initialize three variables.

data=[]
json_str_start=""
json_str=""

Now you will need to iterate over the list of file names.
So, start with a for loop. Make sure to have a colon at the end of the statement.
for f in fname:

Then to ensure that it will look for files in the right place on any operating system, use os.path.join and have it look in the resources folder for the f filename. Indent all the lines after the for loop that need to be repeated.

data_path = os.path.join("resources", f)

Next, use a with statement to open the CSV file and close it automatically when the commands after the with statement are no longer indented. After that, read the data from the CSV file with the reader object.

with open(data_path, ‘r’) as csvfile: reader = csv.reader(csvfile, delimiter=’,’)

I want to take the file names and make them into an upper-level list for the JSON string. First, I remove .csv from the name and make the beginning of a JSON string with curly brackets and double-quotes. Properly formatted JSON strings require double-quotes. Therefore adding them involves the use of single quotes to surround them.

year=f.replace('.csv', '')
json_str_start='{"' + year +'": '

Now, make and initialize another variable to be an empty list
data_list=list()

Then, go row by row through the data and append it to the data_list.
for row in reader:
datalist.append(row)

data = [dict(zip(data_list[0],row)) for row in data_list]

After that, remove the first item in the list.
data.pop(0)

data.pop(0)

Print out the beginning of the string to make sure that it looks right.

print (json_str[0:1000])

Run the code if you haven’t yet to ensure that you constructed the string properly.

Set a path to the JSON file that you want to make.

data_path2 = os.path.join("resources", "file.json")

Finally, open a writeable text file with that path and call that open file jsonfile.

with open(data_path2, 'w+') as jsonfile:
jsonfile.write(json_str)

The data and code for this tutorial is available at: https://github.com/scottcm73/csvs_to_json_tutorial

Please follow and like us: