Parsers Example
This example will demonstrate how diffpy.utils lets us easily process and serialize files. Using the parsers module, we can load file data into simple and easy-to-work-with Python objects.
To begin, unzip
parserdata
and take a look atdata.txt
. Our goal will be to extract and serialize the data table as well as the parameters listed in the header of this file.To get the data table, we will use the
loadData
function. The default behavior of this function is to find and extract a data table from a file.:from diffpy.utils.parsers import loadData data_table = loadData('<PATH to data.txt>')
While this will work with most datasets, on our
data.txt
file, we got aValueError
. The reason for this is due to the comments$ Phase Transition Near This Temperature Range
and--> Note Significant Jump in Rw <--
embedded within the dataset. To fix this, try using thecomments
parameter.data_table = loadData('<PATH to data.txt>', comments=['$', '-->'])
This parameter tells
loadData
that any lines beginning with$
and-->
are just comments and more entries in our data table may follow.Here are a few other parameters to test out:
delimiter=','
: Look for a comma-separated data table. Useful for csv file types. However, sincedata.txt
is whitespace separated, runningloadData('<PATH to data.txt>', comments=['$', '-->'], delimiter=',')
returns an empty list.
minrows=50
: Only look for data tables with at least 50 rows. Since our data table has much less than that many rows, runningloadData('<PATH to data.txt>', comments=['$', '-->'], minrows=50)
returns an empty list.
usecols=[0, 3]
: Only return the 0th and 3rd columns (zero-indexed) of the data table. Fordata.txt
, this corresponds to the temperature and rw columns.loadData('<PATH to data.txt>', comments=['$', '-->'], usecols=[0, 3])
Next, to get the header information, we can again use
loadData
, but this time with theheaders
parameter enabled.hdata = loadData('<PATH to data.txt>', comments=['$', '-->'], headers=True)
Rather than working with separate
data_table
andhdata
objects, it may be easier to combine them into a single dictionary. We can do so using theserialize_data
function.from diffpy.utils.parsers import serialize_data file_data = serialize_data('<PATH to data.txt', hdata, data_table) # File data is a dictionary with a single key # The key is the file name (in our case, 'data.txt') # The entry is a dictionary containing data from hdata and data_table data_dict = file_data['data.txt']
This dictionary
data_dict
contains all entries inhdata
and an additional entry nameddata table
containingdata_table
.here_is_the_data_table = data_dict['data table']
There is also an option to name columns in the data table and save those columns as entries instead.
data_table_column_names = ['temperature', 'scale', 'stretch', 'rw'] # names of the columns in data.txt file_data = serialize_data('<PATH to data.txt>', hdata, data_table, dt_colnames=data_table_column_names) data_dict = file_data['data.txt']
Now we can extract specific data table columns from the dictionary.
data_table_temperature_column = data_dict['temperature'] data_table_rw_column = data_dict['rw']
When we are done working with the data, we can store it on disc for later use. This can also be done using the
serialize_data
function with an additionalserial_file
parameter.:parsed_file_data = serialize_data('<PATH to data.txt>', hdata, data_table, serial_file='<PATH to serialfile.json>')
The returned value,
parsed_file_data
, is the dictionary we just added toserialfile.json
. To extract the data from the serial file, we usedeserialize_data
.from diffpy.utils.parsers import deserialize_data parsed_file_data = deserialize_data('<PATH to serialdata.json>')
Finally,
serialize_data
allows us to store data from multiple text file in a single serial file. For one last bit of practice, we will extract and add the data frommoredata.txt
into the sameserialdata.json
file.:data_table = loadData('<PATH to moredata.txt>') hdata = loadData('<PATH to moredata.txt>', headers=True) serialize_data('<PATH to moredata.txt>', hdata, data_table, serial_file='<PATH to serialdata.json>')
The serial file
serialfile.json
should now contain two entries:data.txt
andmoredata.txt
. The data from each file can be accessed usingserial_data = deserialize_data('<PATH to serialdata.json>') data_txt_data = serial_data['data.txt'] # Access data.txt data moredata_txt_data = serial_data['moredata.txt'] # Access moredata.txt data
For more information, check out the documentation of the parsers
module.