fairly package

Subpackages

Submodules

fairly.diff module

Diff class module.

Diff class is used to keep track of dataset modifications.

Usage example:

>>> diff = Diff()
>>> diff.modify("name", "Johnny", "John")
>>> diff.modified
    {"name": ("Johnny", "John")}

class fairly.diff.Diff[source]

Bases: object

_added

Items added

Type:: Dict

_modified

Items modified

Type:: Dict

_removed

Items removed

Type:: Dict

add(key, val) → None[source]

Appends an item to the diff set as added.

Parameters:

key – Item key
val – Item value

property added: Dict: Returns a dictionary of added items.

property modified: Dict: Returns a dictionary of modified items.

modify(key, val, oldval) → None[source]

Appends an item to the diff set as modified.

Parameters:

key – Item key
val – Item value
oldVal – Old value of the item

remove(key, val) → None[source]

Appends an item to the diff set as removed.

Parameters:

key – Item key
val – Item value

property removed: Dict: Returns a dictionary of removed items.

fairly.metadata module

Metadata class module.

Metadata class is used to store metadata attributes in a standardized manner.

Usage example:

>>> metadata = Metadata({"title": "Title", "DOI": "doi:xxx"})
>>> metadata["authors"] = ["Doe, John"]

class fairly.metadata.Metadata(normalize: Callable = None, serialize: Callable = None, **kwargs)[source]

Bases: MutableMapping

Metadata class.

_attrs

Metadata attributes.

Type:: Dict

_basis

Basis of metadata attributes.

Type:: Dict

_normalize

Attribute normalization method.

Type:: Callable

_serialize

Attribute serialization method.

Type:: Callable

Class Attributes:: REGEXP_DOI: Regular expression to validate DOI.

REGEXP_DOI = re.compile('10\\.\\d{4,9}/[-._;()/:a-z\\d]+', re.IGNORECASE)

autocomplete(overwrite: bool = False, attrs: List = None, **kwargs) → Dict[source]

Completes missing metadata attributes by using the available information.

Supported attributes:

Any attribute with a data type of Person.
Any attribute with a data type of PersonList.

Parameters:

overwrite (bool) – Set True to overwrite existing attributes (default False).
attrs (List) – List of attributes to be completed (optional).
**kwargs – Arguments for the specific autocomplete methods.

Returns:

A dictionary of attributes set by method.

property is_modified: bool

Checks if metadata is modified.

Returns:: True is metadata is modified, False otherwise.

classmethod normalize_value(key: str, val) → Any[source]

Normalizes metadata attribute value.

Supported attributes:

doi
keywords
authors

Parameters:

key (str) – Attribute key.
val – Attribute value.

Returns:

Normalized attribute value.

Raises:

ValueError – If invalid attribute value.

print() → None[source]

Pretty prints metadata.

Serializes metadata and prints as YAML without comments.

rebase() → None[source]: Updates the basis of the metadata attributes.

serialize() → Dict[source]

Serializes metadata as a dictionary.

Returns:: Metadata dictionary.

classmethod serialize_value(key: str, val) → Any[source]

Serializes metadata attribute value.

Supported attributes:

Any attribute with a data type of Person.
Any attribute with a data type of PersonList.

Parameters:

key (str) – Attribute key.
val – Attribute value.

Returns:

Serialized attribute value.

fairly.person module

Person class module.

Person class is used to store person (e.g. author) information in a standardized manner.

Usage example:

>>> person = Person("Doe, John")
>>> person = Person(fullname="Doe, Jon", orcid_id="xxx")
>>> person.affiliation = "fairly Community"

class fairly.person.Person(person: str = None, **kwargs)[source]

Bases: MutableMapping

Class to handle person information, e.g. for authors, contributors, etc.

Class Attributes:: REGEXP_ORCID_ID: Regular expression to validate ORCID identifier. REGEXP_EMAIL: Regular expression to validate e-mail address.

REGEXP_EMAIL = re.compile('[\\w\\.+-]+@([\\w-]+\\.)+[\\w-]{2,}')

REGEXP_ORCID_ID = re.compile('(\\d{4}-){3}\\d{3}(\\d|X)')

autocomplete(overwrite: bool = False, orcid_token: str = None) → Dict[source]

Completes missing information by using the ORCID identifier.

Parameters:: overwrite – If True existing attributes are overwritten.
Returns:: A dictionary of attributes set by method.

static from_orcid_id(orcid_id: str, token: str = None) → Person[source]

Retrieves person information from ORCID identifier.

If not specified, token is read from fairly configuration. If it is also not available, it is retrieved by using get_orcid_token() method.

Parameters:

orcid_id – ORCID identifier.
token – ORCID access token.

Returns:

Person object if valid ORCID identifier, None otherwise.

Raises:

ValueError("No access token") – If access token is not available.
ValueError("Invalid ORCID identifier") – If ORCID identified is not valid.

static get_orcid_token(client_id: str = None, client_secret: str = None) → str[source]

Retrieves ORCID access token by using ORCID client id and secret.

ORCID access token is required to retrieve person information by using an ORCID ID.

If not specified, client_id and client_secret are read from fairly configuration.

Parameters:

client_id – ORCID client id.
client_secret – ORCID client secret.

Returns:

ORCID access token.

Raises:

ValueError("No client id") – If client id is not available.
ValueError("No client secret") – If client secret is not available.
ValueError("Invalid response") – If access token is not retrieved.

static get_persons(people) → List[Person][source]

Returns standard person list from the people argument.

A string or an iterable are accepted as input. If input is a string, it is split using semicolon and line feed as separators. For the items of the iterable, the following are performed:

If it is a Person object, a copy is created.

If it is a string, it is parsed to a dictionary using parse().

If is is a dictionary, Person object is created.

Parameters:: people – People argument.
Returns:: List of person objects.
Raises:: ValueError – If people argument is invalid.

classmethod parse(person: str) → Dict[source]

Parses person identifier and extracts available person attributes.

The following attributes might be extracted:

name
surname
fullname
orcid_id

Parameters:: person – Person identifier (e.g. fullname)
Returns:: Dictionary of person attributes.

serialize() → Dict[source]

Serializes person as a dictionary.

Returns:: Person dictionary.

class fairly.person.PersonList(iterable=None)[source]

Bases: list

append(item)[source]: Append object to the end of the list.

extend(other)[source]: Extend list by appending elements from the iterable.

insert(index, item)[source]: Insert object before index.

Module contents

fairly

fairly.client(id: str, **kwargs) → Client[source]

Creates client object from a client or repository identifier.

Identifier is first checked within recognized repository identifiers. If no match is found, it is regarded as a client identifier. Additional client arguments (e.g. API URL address) might be necessary for the later.

Parameters:

id (str) – Client or repository identifier.
**kwargs – Other client arguments.

Returns:

Client object.

Raises:

ValueError("Invalid client id") – If invalid client id.

Examples

>>> # Create a 4TU.ResearchData client (id = "4tu")
>>> client = fairly.client("4tu")

>>> # Create a Figshare client with a custom URL address
>>> client = fairly.client("figshare", url="https://data.4tu.nl/")

fairly.dataset(id: str) → Dataset[source]

Creates dataset object from a dataset identifier.

The following types of dataset identifiers are supported:

DOI : Digital object identifier of a remote dataset.
URL : URL address of a remote dataset.
Path : Path of a local dataset.

Repository of the dataset is automatically detected by checking the URL addresses and the DOI prefixes of the recognized repositories.

Parameters:: id (str) – Dataset identifier.
Returns:: Dataset object.
Raises:: ValueError("Unknown dataset identifier") – If unknown dataset identifier.

Examples

>>> dataset = fairly.dataset("10.5281/zenodo.6026285")
>>> dataset = fairly.dataset("https://zenodo.org/records/6026285")

fairly.debug(state: bool = True) → None[source]

fairly.get_clients() → Dict[source]

Returns available clients.

Returns:: Dictionary of the available clients. Keys are client identifiers (str), values are client classes (Client).
Raises:: AttributeError("Invalid client module", id) – If a client module is invalid.

Examples

>>> fairly.get_clients()
>>> {'figshare': <class 'fairly.client.figshare.FigshareClient'>, ...}

fairly.get_config(key: str) → Dict[source]

Returns configuration parameters for the specified key.

Configuration parameters are read from the following sources:

Configuration file of the package located at {package_root}/data/config.json
Configuration file of the user located at ~/.fairly/config.json.
Environmental variables of the user starting with FAIRLY_{KEY}_.

Parameters:: key (str) – Configuration key.
Returns:: Dictionary of configuration parameters for the specified key.

Examples

>>> fairly.get_config("fairly")
>>> {'orcid_client_id': 'id', 'orcid_client_secret': 'secret', ...}

fairly.get_environment_config(key: str) → Dict[source]

Returns configuration parameters for the specified key from environmental variables.

Parameters:: key (str) – Configuration key.
Returns:: Dictionary of configuration parameters for the specified key.

Examples

>>> fairly.get_environment_config("fairly")
>>> {'orcid_client_id': 'id', ...}

fairly.get_repositories() → Dict[source]

Returns recognized repositories.

Returns:

Dictionary of the recognized repositories. Keys are repository identifiers (str), values are repository dictionaries (Dict).

Raises:

ValueError – If configuration is invalid.
AttributeError – If a repository has no client id.
AttributeError – If a repository has invalid client id.

Examples

>>> fairly.get_repositories()
>>> {'4tu': {'client_id': 'figshare', 'name': '4TU.ResearchData', 'url': 'https://data.4tu.nl/', ...}, ...}

fairly.get_repository(uid: str) → Dict[source]

Returns repository dictionary of the specified repository.

Parameters:: uid (str) – Repository id or URL address.
Returns:: Repository dictionary if a recognized repository, None otherwise.

Examples

>>> fairly.get_repository("4tu")
>>> {'id': '4tu', 'client_id': 'figshare', 'name': '4TU.ResearchData', 'url': 'https://data.4tu.nl/', ...}

>>> fairly.get_repository("5tu")
>>>

fairly.init_dataset(path: str, template: str = 'default', create: bool = True) → LocalDataset[source]

Initializes a local dataset.

Parameters:

path (str) – Local path of the dataset.
template – Template of the dataset (default = ‘default’).
create – Set True to create the dataset directory if not exists (default = True)

Returns:

Local dataset object

Raises:

ValueError("Invalid path") – If path is invalid.
NotADirectoryError – If path is not a directory path.
ValueError("Operation not permitted") – If path is an existing dataset path.
ValueError("Invalid template name") – If template name is invalid.

fairly.is_testing() → bool[source]

Returns unit testing state.

Returns:: True if performing unit tests, False otherwise

fairly.max_workers() → int[source]: Returns maximum number of workers for file operations.

fairly.metadata_templates() → List[source]

Returns list of available metadata templates.

Returns:: List of available metadata templates (str).

Examples

>>> fairly.metadata_templates()
>>> ['default', 'zenodo', 'figshare']

fairly.notify(file: File, current_size: int, total_size: int = None, current_total_size: int = None) → None[source]

Displays file transfer information.

Parameters:

file (File) – File object.
current_size (int) – Current size of the file.
total_size (int) – Total size of the file (optional).
current_total_size (int) – Current total size of the transfer operation (optional).

fairly.resolveDOI(doi: str) → str[source]

Returns URL address to a DOI.

Parameters:: doi (str) – Digital object identifier
Returns:: URL address of the DOI.
Raises:: ValueError("Invalid DOI") – If DOI is invalid.

fairly.set_max_workers(num: int = None, force: bool = False) → int[source]

Sets number of maximum workers for file operations.

Maximum number of workers is limited to MAX_WORKERS, unless force flag is set.

Parameters:

num (int) – Maximum number of workers for file operations.
force (bool) – Set True to increase the number beyond MAX_WORKERS (default False).

Returns:

Maximum number of workers for file operations.

Raises:

ValueError("Invalid maximum number of workers") – If the number is more than the number of available cores.

fairly.store(id: str, path: str = None, notify: Callable = None, extract: bool = False) → LocalDataset[source]

Stores remote dataset locally

Parameters:

id (str) – Dataset identifier.
path (str) – Local path to store the dataset (optional).
notify (Callable) – Notification callback function.
extract (bool) – Set True to extract dataset archives (default = False)

Returns:

Local dataset object