BYU logo Computer Science

Structured data and JSON

Imagine you are writing an application that keeps track of student data. You might store information about each student in a dictionary:

a dictionary of students, showing first name, surname, age, and major

Notice that each dictionary is structured identically. They each contain:

  • first name
  • surname
  • age
  • major

This is structured data

This means we could think of a generic “student” dictionary that has these keys. Then we could construct a list of students, each one represented by a dictionary:

students = [
    {'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
    {'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
    {'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
    {'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
    {'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]

Now that we a list of students is a list of dictionaries, we can compute things on this data.

We can compute the oldest student

def get_oldest(students):
    oldest = None
    for student in students:
        if oldest is None or student['age'] > oldest['age']:
            oldest = student
    return oldest

oldest = get_oldest(students)
print(oldest)

This prints:

{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'}

We can check whether we have any math majors

# Do we have any math majors?
def has_math_major(students):
    for student in students:
        if student['major'] == 'Mathematics':
            return True
    return False

print(has_math_major(students))

This prints:

True

Data abstraction

What we are discovering here is the concept of data abstraction. A “student” is a collection of information about that student. We could imagine putting lots more data into a student record. For example:

  • department
  • class standing
  • home town
  • current residence
  • student ID
  • gender
  • nationality
  • languages spoken

But we really only need whatever our program needs.

When creating a data abstraction, the point is not to define all the possible properties that might apply in real life, but to define the set of properties needed by your program.

We can call the definition of what goes into a student a type definition, schema, or shape.

A quiz

What properties does a student need to have for this code to work?

def print_eligible_students(students):
    # Students must be part of the Physics major and be at least 21 years old
    for student in students:
        if student['major'] == 'Physics' and student['age'] >= 21:
            print(f"{student['last']}, {student['first']} ({student['standing']})")

work on this with a friend

By examining this code, you should be able to identify that the student dictionary has the following keys:

  • major (string)
  • age (numeric)
  • last
  • first
  • standing

It could have other keys. But this is the set of keys it needs for the above code to work. If some of these properties are missing, you will get a KeyError.

JSON

Imagine you want to take all of the data from a Python dictionary and send it to your friend. Or maybe you want to export it to a file, so that you can read it into a different program (maybe one that tracks alumni).

JSON is the most commonly-used method for transferring data between programs.

Writing a JSON file

Here is how we can convert a Python dictionary into a JSON file:

import json

students = [
    {'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
    {'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
    {'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
    {'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
    {'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]

with open('students.json', 'w') as file:
    json.dump(students, file)

Notice that we need to

  • import json — import the json library
  • json.dumps — dump a dictionary to a file

If you run this code, it creates a file called students.json that contains:

[{"firstname": "Juan", "surname": "Lopez", "age": 18, "major": "Linguistics"},
{"firstname": "Ulysses", "surname": "Bennion", "age": 25, "major": "Mechanical Engineering"},
{"firstname": "Sarah", "surname": "Grover", "age": 19, "major": "Mathematics"},
{"firstname": "Mary", "surname": "Han", "age": 20, "major": "Nursing"},
{"firstname": "Jacob", "surname": "Smith", "age": 18, "major": "Open Major"}]

We have formatted this with line breaks so you can see it better, but in the file, it is just one long string.

If you would like a file that is easier to read, you can use the keyword argument of ident=2:

import json

students = [
    {'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
    {'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
    {'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
    {'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
    {'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]

with open('students.json', 'w') as file:
    json.dump(students, file, indent=2)

This will now create students.json as shown:

[
  {
    "firstname": "Juan",
    "surname": "Lopez",
    "age": 18,
    "major": "Linguistics"
  },
  {
    "firstname": "Ulysses",
    "surname": "Bennion",
    "age": 25,
    "major": "Mechanical Engineering"
  },
  {
    "firstname": "Sarah",
    "surname": "Grover",
    "age": 19,
    "major": "Mathematics"
  },
  {
    "firstname": "Mary",
    "surname": "Han",
    "age": 20,
    "major": "Nursing"
  },
  {
    "firstname": "Jacob",
    "surname": "Smith",
    "age": 18,
    "major": "Open Major"
  }
]

Notice how this pretty much looks exactly like a Python dictionary. :-)

Reading a JSON file

You can likewise load a JSON file in Python:

import json

with open('students.json') as file:
    student_info = json.load(file)

print(student_info)

This will print:

[{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
{'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
{'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
{'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}]

You could print the first student with:

print(student_info[0])

This will print:

{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'}

Example: Pokemon

You are given a large file, pokedex.json, which is a bunch of information on Pokemon. It has the following schema:

  • id (integer)
  • name (dictionary)
    • english
    • japanese
    • chinese
    • french
  • type (list of strings)
  • base (dictionary)
    • HP (integer)
    • Attack (integer)
    • Defense (integer)
    • Sp. Attack (integer)
    • Sp. Defense (integer)
    • Speed (integer)

Here is an example:

{
  "id": 242,
  "name": {
    "english": "Blissey",
    "japanese": "ハピナス",
    "chinese": "幸福蛋",
    "french": "Leuphorie"
  },
  "type": ["Normal"],
  "base": {
    "HP": 255,
    "Attack": 10,
    "Defense": 10,
    "Sp. Attack": 75,
    "Sp. Defense": 135,
    "Speed": 55
  }
}

Largest HP

Let’s write a function to find the Pokemon with the largest HP.

  • If pokemon is a variable holding the dictionary for a single Pokemon, what is the expression to find the HP of that Pokemon?

work on this with a friend

You can use pokemon['base']['HP'] to get the HP of a pokemon.

Here is the code for this function:

import json


def find_largest_hp(pokedex):
    # keep track of the largest
    largest_hp = None
    # loop through all of the pokemon
    for pokemon in pokedex:
        # check if this is the largest
        if largest_hp is None or pokemon['base']['HP'] > largest_hp['base']['HP']:
            # store the largest we have seen so far
            largest_hp = pokemon
    return largest_hp


# open a JSON file with all the pokemon
with open('pokedex.json') as file:
    # load the file into a dictionary
    pokedex = json.load(file)

largest_hp = find_largest_hp(pokedex)
print(largest_hp)

Notice that we can use for ... in to loop through all the pokemon in the pokedex. This will go through them in whatever order they were initially added. Be sure to download pokedex.json and then you can run this code.

Your code should print:

{'id': 242, 'name': {'english': 'Blissey', 'japanese': 'ハピナス', 'chinese': '幸福蛋', 'french': 'Leuphorie'}, 'type': ['Normal'], 'base': {'HP': 255, 'Attack': 10, 'Defense': 10, 'Sp. Attack': 75, 'Sp. Defense': 135, 'Speed': 55}}

Fewest members

Which Pokemon type has the fewest members?

We need an algorithm that looks like this:

  • load the Pokemon from a JSON file
  • group all of the Pokemon by type
  • loop through all of the types, finding the one with the fewest Pokemon in it
  • print out the smallest group

Here is code that does this:

import json


def find_rarest_type(pokedex):
    groups = group_by_type(pokedex)
    return find_smallest_group(groups)


with open('pokedex.json') as file:
    pokedex = json.load(file)

name, group = find_rarest_type(pokedex)
print(name, len(group))

Notice that we have two functions we have not written yet — group_by_type() and find_smallest_group(). Thinking this way helps us write out the structure of the algorithm first, and then we can fill in the details of these two functions.

Now, this has two major pieces left:

  • How do we group Pokemon by types?
  • How do we find the smallest group?

work on this with a friend

Grouping by types

To group Pokemon by types, we need a dictionary:

{
    type : [list of Pokemon of that type]
}

Here is a function that does that:

def group_by_type(pokedex):
    # create an empty dictionary
    groups = {}
    # go through all the Pokemon
    for pokemon in pokedex:
        # go through all the types that this Pokemon belongs in
        for tp in pokemon['type']:
            # if this type is not in the dictionary, add it
            if tp not in groups:
                groups[tp] = []
            # append this Pokemon to the list of Pokemon for that type
            groups[tp].append(pokemon)
    return groups

Note that we are using tp instead of type for the variable because type is a reserved keyword in Python.

Finding the smallest group

Here is a function that finds the smallest types, using the groups dictionary from above:

def find_smallest_group(groups):
    # smallest and smallest type are None to start with
    smallest = None
    smallest_type = None
    # go through all of the Pokemon types and their Pokemon
    for tp, group in groups.items():
        # if this is the smallest so far, keep track of it
        if smallest is None or len(group) < len(smallest):
            smallest = group
            smallest_type = tp

    # return the smallest type and the group of Pokemon that are in this type
    return smallest_type, smallest

Note that we could have kept track of the count of Pokemon instead of a list of the actual Pokemon for each type. We use a list because maybe someday we want to add functionality that prints out the list of Pokemon of this type.

Running the code

If you put these two functions into the above code, and run it, then you should get:

Ice 34