.. _adding-data-to-data-module:
Adding data to the data module
==============================
Example datasets used in the documentation and tests are included in the
:mod:`kikuchipy.data` module via the `pooch `__
Python library. These are listed in a file registry (``kikuchipy.data._registry.py``)
with their file verification string (hash, MD5, obtain with e.g. ``md5sum ``) and
location, the latter potentially not within the package but from the `kikuchipy-data
`__ repository or elsewhere, since some files
are considered too large to include in the package.
If a required dataset isn't in the package, but is in the registry, it can be downloaded
from the repository when the user passes ``allow_download=True`` to e.g.
:func:`~kikuchipy.data.nickel_ebsd_large`. The dataset is then downloaded to a local
cache, in the location returned from ``pooch.os_cache("kikuchipy")``. The location can
be set with a global `KIKUCHIPY_DATA_DIR` variable locally, e.g. by setting
``export KIKUCHIPY_DATA_DIR=~/kikuchipy_data`` in ``~/.bashrc``. Pooch handles
downloading, caching, version control, file verification (against hash) etc. of files
not included in the package. If we have updated the file hash, pooch will re-download
it. If the file is available in the cache, it can be loaded as the other files in the
data module.
With every new version of kikuchipy, a new directory of datasets with the version name
is added to the cache directory. Any old directories are not deleted automatically, and
should then be deleted manually if desired.