Data storage with qudi
Qudi provides data storage objects that can be imported from
qudi.util.datastorage for saving and loading (measurement) data.
There is an object for each supported data storage format, which
currently includes:
TextDataStoragefor text filesCsvDataStoragefor csv files (specialized text file)NpyDataStoragefor numpy binary files (.npy)
qudi.util.datastorage for any objects not listed in this
documentation.qudi.util.datastorage.DataStorageBase which is very loosely
defining a generalized API for all storage classes and handles global
metadata.The most important API methods that each specialized sub-class must implement are:
def save_data(self, data, *, metadata=None, notes=None, nametag=None, timestamp=None, **kwargs):
# Save data to appropriate format
pass
def load_data(self, *args, **kwargs):
# Load data and metadata and return it
pass
The exact method signatures with additional keyword-only arguments can differ between storage classes and can be looked up individually.
Before you can start saving or loading data arrays with the methods
mentioned above, you need to instantiate and configure the storage
object appropriately. Each specialized storage object can provide an
entirely different set of parameters to initialize. You can look up
configuration options for a specific storage object in the __init__
method doc string of the respective class.
So the first step before loading and saving data arrays is always to create an instance of the desired storage object.
Here is an example for storing text files that is using a commonly used
subset of the available __init__ parameters to initialize the
storage object:
from qudi.util.datastorage import TextDataStorage, ImageFormat
# Instantiate text storage object and configure it
data_storage = TextDataStorage(root_dir='C:\\Data\\MyMeasurementCategory',
comments='# ',
delimiter='\t',
file_extension='.dat',
column_formats=('.8f', '.15e'),
include_global_metadata=True,
image_format=ImageFormat.PNG)
Let’s go through the parameters one-by-one: - root_dir: The root or
working directory for the storage class to work in. Files will be saved
into this dir. - comments: String used at the start of lines in the
text file to identify them as comment lines. - delimiter: Delimiter
string used to separate data columns. Must be non-empty. -
file_extension: The default file extension to use for new data
files. Used if not explicit file name is provided - column_formats:
Sequence of format specifiers for each column or a single specifier for
all columns. If None (default) the column format is derived from the
first data row. See also format specification
mini-language
- include_global_metadata: Flag indicating if global metadata should
be automatically included when saving data. - image_format: The
image format used to save matplotlib figures to file using storage
method save_thumbnail.
Storage location
Generally you have to set the root_dir parameter for (file-based)
storage objects before saving or loading any data.
module_default_data_dir containing a standardized
generic data directory. This directory respects the global config
options default_data_dir and daily_data_dirs and adds a
module-specific sub-directory. If applicable, you should always use
this attribute to set root_dir in storage objects used by a qudi
logic module.<user home>/qudi/Data/<YYYY>/<MM>/<YYYYMMDD>/<configured module name>In case you really want to customize the storage location on a
per-module basis, you should overwrite module_default_data_dir in
the module class definition in order to make the custom path accessible
from outside the module. By default all file based data is stored in
daily sub-directories of the qudi data directory (default is
<user_home>/qudi/Data/ but it can be changed via global config
parameter default_data_dir).
Standalone scripts that use the qudi data storage objects obviously do
not need to follow any convention and can customize root_dir however
they like.
Saving data
save_data is used to store data in the desired format
once the storage object has been initialized.import numpy as np
from datetime import datetime
# Create example data
x = np.linspace(0, 1, 1000) # 1 sec time interval
y = np.sin(2 * np.pi * 2 * x) # 2 Hz sine wave
data = np.asarray([x, y]).transpose() # Format data into a single 2D array with x being the first
# column and y being the second column
# Prepare a dict containing metadata to be saved in the file header
metadata = {'sample_number': 42,
'batch' : 'xyz-123'}
# Create an explicit timestamp.
timestamp = datetime(2021, 5, 6, 11, 11, 11) # 06.05.2021 at 11h:11m:11s
# timestamp = datetime.now() # Usually you would use this
# Create a nametag to include in the file name (optional)
nametag = 'amplitude_measurement'
# Create an iterable of data column header strings (optional)
column_headers = ('time (s)', 'amplitude (V)')
# Create an arbitrary string of informal "lab notes" that is included in the file header
notes = 'This measurement was performed under the influence of 10 mugs of coffee and no sleep.'
# Save data to file
file_path, timestamp, (rows, columns) = data_storage.save_data(data,
timestamp=timestamp,
metadata=metadata,
notes=notes,
nametag=nametag,
column_headers=column_headers,
column_dtypes=(float, float))
This will save the data to a file with a generic filename constructed
from nametag and timestamp.
<default_data_dir>/2021/05/20210506/20210506-1111-11_amplitude_measurement.dat
with the following content:
# [General]
# timestamp=2021-05-06T11:11:11
# comments='# '
# delimiter='\t'
# column_dtypes=float;;float
# column_headers='time (s);;amplitude (V)'
# notes='This measurement was performed under the influence of 10 mugs of coffee and no sleep.'
#
# [Metadata]
# sample_number=42
# batch='xyz-123'
#
# ---- END HEADER ----
0.00000000 0.000000000000000e+00
0.00100100 1.257861783874106e-02
0.00200200 2.515524538937585e-02
⋮ ⋮
'[...]'.repr and eval, i.e. value == eval(repr(value)).int, float, complex or str. This
will become important when loading back mixed data from disk. If
column_dtypes is None (default) the dtypes will be
automatically derived from the first data row.Alternatively it is also possible to specify the filename directly instead of relying on the generic construction from nametag and timestamp:
# Save data to file
file_path, timestamp, (rows, columns) = data_storage.save_data(data,
timestamp=timestamp,
metadata=metadata,
notes=notes,
column_headers=column_headers,
column_dtypes=(float, float),
filename='my_custom_filename.abc')
<default_data_dir>/2021/05/20210506/my_custom_filename.abc.Saving a thumbnail
In order to save a thumbnail alongside the data file, you can create a
matplotlib figure and pass it to the data storage method
save_thumbnail.
save_thumbnail expects a full file path without file extension
(this is automatically completed according to the configured
image_format enum).save_data and pass it to
save_thumbnail.To continue our example with text files, this could look like:
import matplotlib.pyplot as plt
# Create figure and plot data
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(x, y)
ax.set_xlabel('time (s)')
ax.set_ylabel('amplitude (V)')
# Save figure as thumbnail with the same file name as the corresponding data file
figure_path = data_storage.save_thumbnail(fig, file_path.rsplit('.')[0])
This example creates the file:
<default_data_dir>/2021/05/20210506/20210506-1111-11_amplitude_measurement.png
Loading data
All storage object provide means to load back data and corresponding metadata from disk.
ToDo: COMPLETE THIS SECTION
Global metadata
It is possible to set global metadata that will be automatically
included in all data storage objects (class attribute of
DataStorageBase) until it is actively removed again. So modules
adding global metadata must handle robust and safe cleanup afterwards.
The global metadata is a dict and will be handled exactly the same as
the metadata keyword-only parameter of the data storage
save_data method. Except it does not need to be given each time data
is saved and it applies globally to all data storage instances
throughout the process. You can combine global metadata and locally
provided metadata. The latter will always take precedence over the
global metadata if keys are present in both dicts.
Adding global metadata
You can add global metadata key-value pairs by using the storage object
class method <storage_class>.add_global_metadata. In our example
from above this would look like:
# Create global metadata to ADD to the global metadata dict
global_meta = {'user': 'Batman'}
# Add metadata in a thread-safe way to ALL data storage objects
data_storage.add_global_metadata(global_meta, overwrite=False)
# This would have the same effect
from qudi.util.datastorage import DataStorageBase
DataStorageBase.add_global_metadata(global_meta)
# ...or this
from qudi.util.datastorage import NpyDataStorage
NpyDataStorage.add_global_metadata(global_meta)
# You can also add a single key-value pair like this:
data_storage.add_global_metadata('frustration_level', 9000, overwrite=False)
overwrite parameter. If this flag is set to
False (default) the method will raise a KeyError if any
metadata keys to set are already present in the global metadata dict.
If it is set to True this method will silently overwrite any
key-value pairs.False) whenever
possible in order to avoid hard to track bugs when two threads
(i.e. qudi logic modules) are using the same metadata keys.Removing global metadata
on_deactivate method of a qudi logic module would be a good place
to remove any global metadata that has been added by the same module.<storage_class>.remove_global_metadata, e.g. like:# to remove a single key-value pair
data_storage.remove_global_metadata('user')
# or if you want to remove multiple key-value pairs with one call
data_storage.remove_global_metadata(['user', 'frustration_level'])
Reading global metadata
You can get a shallow copy of the global metadata dict via:
metadata = data_storage.get_global_metadata()
Since the returned dict is only a shallow copy of the actual global metadata dict one must avoid to mutate any of the values unless you are very sure what you are doing.
Logging Data
Another common use-case instead of dumping an entire data set at once is saving one chunk of data (or a single entry) at a time by appending to an already created file / database. This could for example be be useful for a data logger.
In order to do this, TextDataStorage and CsvDataStorage have
additional API methods new_file and append_file.
new_file accepts the same keyword-only arguments as save_data
and will create a new data file containing only the file header. The
only difference is an additional keyword-only parameter dtype for
which you should provide a numpy dtype since it can not be derived
from the data array in this case (numpy.float will be assumed by
default).
The created file can then be appended by single or multiple rows of data
using append_file (you can also append files created by
save_data).
An example:
# Create data file with the same variables as in the save_data example above
file_path, timestamp = data_storage.new_file(timestamp=timestamp,
metadata=metadata,
notes=notes,
nametag=nametag,
column_headers=column_headers,
column_dtypes=(float, float))
# Append each row of the previously created data array one after the other
for data_row in data:
data_storage.append_file(data_row, file_path)
# You can also append a chunk of multiple rows at once
data_storage.append_file(data[:10], file_path)
append_file will have the
overhead of opening and closing a file handle.TextDataStorage or
DataStorageBase.Thread-Safety
The handling of the global parameters (read/add/remove) can be considered thread-safe.