Adding data to the data module#
Example datasets used in the documentation and tests are included in the
kikuchipy.data module via the pooch
Python library. These are listed in a file registry (
with their file verification string (hash, MD5, obtain with e.g.
md5sum <file>) and
location, the latter potentially not within the package but from the kikuchipy-data repository or elsewhere, since some files
are considered too large to include in the package.
If a required dataset isn’t in the package, but is in the registry, it can be downloaded
from the repository when the user passes
allow_download=True to e.g.
nickel_ebsd_large(). The dataset is then downloaded to a local
cache, in the location returned from
pooch.os_cache("kikuchipy"). The location can
be set with a global KIKUCHIPY_DATA_DIR variable locally, e.g. by setting
export KIKUCHIPY_DATA_DIR=~/kikuchipy_data in
~/.bashrc. Pooch handles
downloading, caching, version control, file verification (against hash) etc. of files
not included in the package. If we have updated the file hash, pooch will re-download
it. If the file is available in the cache, it can be loaded as the other files in the
With every new version of kikuchipy, a new directory of datasets with the version name is added to the cache directory. Any old directories are not deleted automatically, and should then be deleted manually if desired.