Data comes in all shapes and sizes.
While many of us spend most of our data education and careers working with data in relatively "friendly" formats, such as spreadsheets and CSV files, there may come a time when you’re confronted with data that isn’t so friendly. You might not even be able to visualize it straight out of the box.
This happened to me recently, when a computer model I was running was outputting data in a gridded binary format. The tricky thing about binary files is figuring out how to read them to access and analyze their contained data. After scouring the edges of the internet for a solution, I cobbled together a simple Python function that allows you to read gridded binary data so that it can later be analyzed using your favorite Python libraries, such as matplotlib, or NumPy.
This niche solution will allow you to read gridded binary data files with GDAT file endings produced from Computer Models, particularly those modeling natural processes, such as environmental or meteorological phenomena. As such, the code below makes these assumptions:
- Your GDAT file follows GrADS conventions (though this will likely work for various other binary files).
- Your GDAT file represents a gridded study area over a specified study period.
- In the GDAT file, there is a grid of data values for each day in the specified study period.
- Each cell in the grid of data values contains a tuple of values.
- The grid of data values has a set number of rows and columns that can be used to index cells.

Reading the binary GDAT file
Python">import struct
def read_gdat_file(file_path, format_string, number_rows,
number_columns, number_days):
data = []
with open(file_path, 'rb') as f:
for _ in range(number_days):
day_data = []
for _ in range(number_rows):
row_data = []
for _ in range(number_columns):
value = struct.unpack(format_string, f.read(4))[0]
row_data.append(value)
day_data.append(row_data)
data.append(day_data)
return data
The above code reads a binary GDAT file and structures its data to resemble the grid of your study area for easier interpretation and analysis.
- import struct: struct is a Python module that allows you to work with binary data. This module contains functions that allow you to convert binary data into Python objects and vice versa.
- def read_gdat_file(file_path, format_string, number_rows, number_columns, number_days): This line begins the function that will allow us to read a binary file. For it to work, we will need to pass in arguments that detail the path to the correct GDAT file, the format type of the GDAT file, the number of rows and columns representing the study area, and finally the number of days the GDAT data covers. Knowing the number of days represented in the GDAT file allows the function to correctly partition the binary data into the rows and columns necessary to represent the study area for each day. This facilitates accurate analysis of the data later on. You should be able to find the number of days, as well as the number of rows and columns needed to represent the study area within whatever computer model parameters you’re using to generate the GDAT data.
- data = []: This line initializes an empty Python list that will be used to contain the GDAT data in its final grid format.
- with open(file_path, ‘rb’) as f: This line opens the binary file in read mode (designated by the ‘rb’ argument), allowing the function to access its data. Opening the binary file using the ‘with’ statement ensures that the file is closed after you have accessed the data.
- for _ in range(number_days): This for loop iterates through the binary data and reads the data for each specified day. I’ve opted to use an underscore in this for loop (as well as the following for loops) because the variable doesn’t need to have a name as I will not be using it later. You can use typical iteration counter variables, such as i, or j if it better suits your programming style.
- day_data = []: This line initializes an empty Python list that will be used to contain the binary data for each day. It will contain all of the rows of binary data relating to that specific day.
- for _ in range(number_rows): This for loop iterates through the specified number of rows within the specified day.
- row_data = []: This line initializes an empty Python list that will be used to contain the binary data for the current row within the specified day.
- for _ in range(number_columns): This for loop iterates through the data found in the specified number of columns within the specified row.
- value = struct.unpack(format_string, f.read(4))[0]: This line initializes a variable called value and, using the unpack function from the struct module, reads however many bytes of binary data from the GDAT file at a time and interprets it according to the format_string specified (read section "Format Characters" to understand which format string you need to specify). The unpack function returns a tuple. In the code line, [0] is placed at the end to indicate that the function should only return the first (and in some instances, the only) value of the tuple. If each cell in your modeled study area contains a tuple with multiple values, it is unnecessary to include [0] at the end of the line unless you’re only interested in one of the cell values. For example, a scenario where you may have cells containing tuples with multiple values arises when the value measured in the cell has x and y components (i.e., wind).
- row_data.append(value): This line appends the unpacked float value to row_data, which represents the current row.
- day_data.append(row_data): This line appends the current row to day_data, which represents the current day.
- data.append(day_data): This line appends the data for the current day to data, which represents the overall dataset.
- return data: This function will continue iterating through the binary data file until it has read the grid data for each day into the overall dataset, designated as data. This line returns the overall dataset, converted from the binary file into a Python list. data returns gridded data separated into each day of the study period. This dataset can now be analyzed.
Returning the data for a particular cell in the study area grid for the entire study period
While your computer model likely produces data for a large study area, you may only be interested in analyzing the data for a particular cell within the grid across the entire study period.
Say, for example, you want to see how closely the computer model produced wind speed values compared to observed wind speed values. There exists a meteorological station with wind speed observations in a particular cell. We will extract the data for the cell containing the meteorological station for the entire study period, after which you will be able to plot the observed versus the modeled data to determine how accurate the model is.
The Python function below uses the Python list data returned from the previous function.
def read_cell_data_for_study_period(data, row_index, column_index):
cell_data = []
for day_data in data:
reversed_day_data = day_data[::-1] #Optional
cell_value = reversed_day_data[row_index][column_index]
cell_data.append(cell_value)
return cell_data
The above code extracts the specified cell data for the entire study period.
- def read_cell_data_for_study_period(data, row_index, column_index): This line begins the function that will extract the cell data for a specified cell using a row index and a column index to specify the cell’s location. The data argument takes the variable containing the list holding the GDAT data in its final grid format (this was created using the previous function). The row_index and column_index arguments take the integers specifying the row and column where the cell of interest is located.
- cell_data = []: This line initializes an empty Python list that will contain the cell data for the entire study period.
- for day_data in data: This for loop iterates through the gridded data for each day of the study period.
- reversed_day_data = day_data[::-1]: This optional line of code is used if, upon printing out the cell data for the specified study period, you find that the gridded data is not being read from the correct starting point. In most scenarios, gridded data will be read from the upper left corner and will therefore be "0 indexed". However, in some scenarios, the gridded data is read from the lower left corner. This phenomenon causes the grid indexing to be wrong, resulting in the wrong cell being read using your specified row_index and column_index. Therefore, this optional line of code flips the gridded data vertically so it is read beginning from the upper left corner. Note: This line should only be used if it is determined that the grid of data is being read from the lower left corner. Omit this line if your data grid is being read correctly to avoid erroneous data readings.
- cell_value = reversed_day_data[row_index][column_index]: This line initializes a variable called cell_value which will contain the cell data at the specified row and column index for each day of the study period. As you can see, your specified row_index and column_index arguments are used to access the correct cell in the gridded data.
- cell_data.append(cell_value): This line appends the cell data for the current day to cell_data, which represents the overall list containing all of the cell values for the entire study period.
- return cell_data: This function will continue iterating through each day of data and appending the value at a specific cell to the list designated as cell_data. This line returns the list, after which you will be able to print out and analyze the cell values for each day of the study period.
Example of how you can analyze cell data
import struct
import matplotlib.pyplot as plt
#Function that reads the binary file (see above)
def read_gdat_file(file_path, format_string, number_rows,
number_columns, number_days):
data = []
with open(file_path, 'rb') as f:
for _ in range(number_days):
day_data = []
for _ in range(number_rows):
row_data = []
for _ in range(number_columns):
value = struct.unpack('f', f.read(4))[0]
row_data.append(value)
day_data.append(row_data)
data.append(day_data)
return data
#Function that returns the data for a specific cell for the entire study
# period (see above)
def read_cell_data_for_study_period(data, row_index, column_index):
cell_data = []
for day_data in data:
reversed_day_data = day_data[::-1] #Optional
cell_value = reversed_day_data[row_index][column_index]
cell_data.append(cell_value)
return cell_data
#Specifying the file path to the binary file, wherever it's located
# on your system; also, specifying the format_string for the file.
file_path_binary_data = "file-path-binary-data.gdat"
format_string = 'f'
#Specifying the number of rows, columns, and days represented in the
# binary file
number_rows_in_gridded_study_area = 45
number_columns_in_gridded_study_area = 108
number_days_in_study_period = 365
#Reading the binary file
data = read_gdat_file(
file_path=file_path_binary_data,
format_string=format_string,
row_index=number_rows_in_gridded_study_area,
column_index=number_columns_in_gridded_study_area,
day_index=number_days_in_study_period)
#Specifying the day, row, and column index used to read the values from
# a specific cell. These index values must abide by the specified number
# of rows and columns in the study area (above).
day_index = 0
row_index = 30
column_index = 90
#Reading the cell data for each day in the study period
data_for_specific_cell_for_study_period = read_cell_data_for_study_period(
data=data,
row_index=row_index,
column_index=column_index)
#Plotting the cell data for each day in the study period
plt.figure(figsize=(10,6))
plt.plot(1, len(data_for_specific_cell_for_study_period) +1),
data_for_specific_cell_for_study_period,
label='Simulated Data',
color='blue')
plt.xlabel('Day')
plt.ylabel('Unit of simulated data')
plt.title('Simulated data at specified cell for study period')
plt.legend()
plt.show()
Troubleshooting
- Read your computer model documentation to understand how its output is formatted. This will help you determine which values you want to extract from the tuple of data representing each cell, as well as what format the cell values are in (i.e., floating point, etc.).
- If possible, create TIF files from your GDAT files and open them in a GIS program. This will allow you to visualize your gridded data, as well as to check that your gridded data is being read from the upper left corner by the function used to read cell data for each day of your study period.
Subscribe to get my stories sent directly to your inbox: Story Subscription
Please become a member to get unlimited access to Medium using my referral link (I will receive a small commission at no extra cost to you): Medium Membership
Subscribe to my newsletter to get more exclusive data-driven content with an environmentalist spin: DataDrivenEnvironmentalist