fairly.client package

Submodules

fairly.client.djehuty module

class fairly.client.djehuty.DjehutyClient(repository_id: str = None, **kwargs)[source]

Bases: Client

4TU.ResearchData is using custom_fields to store the following information:

  • Contributors

  • Data Link

  • Derived From

  • Format

  • Geolocation Latitude

  • Geolocation Longitude

  • Geolocation

  • Language

  • Licence remarks

  • Organizations

  • Publisher

  • Same As

  • Time coverage

From the developers:

> The use of “Data Link” is inconsistent, so try to avoid using it. In > djehuty, we will assign “Data Link” values as “file links” where > applicable (so they show up under “files”). I think also “Organizations” > and “Publisher” are not entirely consistent and will have gone through > manual cleanup once djehuty goes live.

LOCKED_SLEEP = 5
LOCKED_TRIES = 5
PAGE_SIZE = 25
REGEXP_UUID = re.compile('([a-f\\d]+)(-[a-f\\d]+)+', re.IGNORECASE)
property categories: Dict
get_categories(refresh: bool = False) Dict[source]
classmethod get_config(**kwargs) Dict[source]
classmethod get_config_parameters() Dict[source]

Returns configuration parameters.

Returns:

Dictionary of configuration parameters. Keys are the parameter names, values are the descriptions.

get_details(id: Dict) Dict[source]

Returns standard details of the specified dataset.

Details dictionary:
  • title (str): Title

  • url (str): URL address

  • doi (str): DOI

  • status (str): Status

  • size (int): Total size of data files in bytes

  • created (datetime.datetime): Creation date and time

  • modified (datetime.datetime): Last modification date and time

Possible statuses are as follows:
  • “draft”: Dataset is not published yet.

  • “public”: Dataset is published and is publicly available.

  • “embargoed”: Dataset is published, but is under embargo.

  • “restricted”: Dataset is published, but accessible only under certain conditions.

  • “closed”: Dataset is published, but accessible only by the owners.

  • “unknown”: Dataset is in an unknown state.

Parameters:

id (Dict) – Standard dataset id

Returns:

Details dictionary of the dataset.

get_files(id: Dict) List[RemoteFile][source]
property licenses: Dict

Retrieves list of available licenses

License dictionary:
  • id (int): License identifier

  • name (str): Name of the license

  • url (str): URL address of the license

Returns:

List of license dictionaries

record_type_lookup = {'conference contribution': 'conferencepaper', 'journal contribution': 'article', 'media': 'video'}
record_types = {'book': 'Book', 'conference contribution': 'Conference Contribution', 'dataset': 'Dataset', 'figure': 'Figure', 'journal contribution': 'Journal Contribution', 'media': 'Media', 'online resource': 'Online Resource', 'poster': 'Poster', 'preprint': 'Preprint', 'presentation': 'Presentation', 'software': 'Software', 'thesis': 'Thesis'}
save_metadata(id: Dict, metadata: Metadata) None[source]

Saves metadata of the specified dataset

Parameters:
  • id (Dict) – Standard dataset id

  • metadata (Metadata) – Metadata to be saved

Raises:

ValueError("No access token")

classmethod supports_folder() bool[source]

Returns if folders are supported.

validate_metadata(metadata: Metadata) Dict[source]

Validates metadata

Parameters:

metadata (Metadata) – Metadata to be validated

Returns:

Dictionary of invalid metadata fields and related error messages, if any.

fairly.client.figshare module

class fairly.client.figshare.FigshareClient(repository_id: str = None, **kwargs)[source]

Bases: Client

LOCKED_SLEEP = 5
LOCKED_TRIES = 5
PAGE_SIZE = 25
property categories: Dict
get_categories(refresh: bool = False) Dict[source]
classmethod get_client(url: str) Client[source]

Creates a repository client from the specified URL address.

Parameters:

url (str) – URL address of the repository or dataset.

Returns:

Client object (InvenioClient).

Raises:

ValueError("Invalid repository") – If repository is not valid.

classmethod get_config(**kwargs) Dict[source]
classmethod get_config_parameters() Dict[source]

Returns configuration parameters.

Returns:

Dictionary of configuration parameters. Keys are the parameter names, values are the descriptions.

get_details(id: Dict) Dict[source]

Returns standard details of the specified dataset.

Details dictionary:
  • title (str): Title

  • url (str): URL address

  • doi (str): DOI

  • status (str): Status

  • size (int): Total size of data files in bytes

  • created (datetime.datetime): Creation date and time

  • modified (datetime.datetime): Last modification date and time

Possible statuses are as follows:
  • “draft”: Dataset is not published yet.

  • “public”: Dataset is published and is publicly available.

  • “embargoed”: Dataset is published, but is under embargo.

  • “restricted”: Dataset is published, but accessible only under certain conditions.

  • “closed”: Dataset is published, but accessible only by the owners.

  • “unknown”: Dataset is in an unknown state.

Parameters:

id (Dict) – Standard dataset id

Returns:

Details dictionary of the dataset.

get_files(id: Dict) List[RemoteFile][source]
property licenses: Dict

Retrieves list of available licenses

License dictionary:
  • id (int): License identifier

  • name (str): Name of the license

  • url (str): URL address of the license

Returns:

List of license dictionaries

record_type_lookup = {'conference contribution': 'conferencepaper', 'journal contribution': 'article', 'media': 'video', 'online resource': 'onlineresource'}
record_types = {'book': 'Book', 'conference contribution': 'Conference Contribution', 'dataset': 'Dataset', 'figure': 'Figure', 'journal contribution': 'Journal Contribution', 'media': 'Media', 'online resource': 'Online Resource', 'poster': 'Poster', 'preprint': 'Preprint', 'presentation': 'Presentation', 'software': 'Software', 'thesis': 'Thesis'}
save_metadata(id: Dict, metadata: Metadata) None[source]

Saves metadata of the specified dataset.

Parameters:
  • id (Dict) – Standard dataset id.

  • metadata (Metadata) – Metadata to be saved.

Raises:

ValueError("No access token") – If not access token.

classmethod supports_folder() bool[source]

Returns if folders are supported.

validate_metadata(metadata: Metadata) Dict[source]

Validates metadata

Parameters:

metadata (Metadata) – Metadata to be validated

Returns:

Dictionary of invalid metadata fields and related error messages, if any.

fairly.client.invenio module

class fairly.client.invenio.InvenioClient(repository_id: str = None, **kwargs)[source]

Bases: Client

Class Attributes:

PAGE_SIZE (int): Default page size. KEEP_ALIVE (int): Keep alive seconds.

_details

Record details cache.

Type:

Dict

KEEP_ALIVE = 10
PAGE_SIZE = 100
classmethod get_client(url: str) Client[source]

Creates a repository client from the specified URL address.

Parameters:

url (str) – URL address of the repository or dataset.

Returns:

Client object (InvenioClient).

Raises:

ValueError("Invalid repository") – If repository is not valid.

classmethod get_config(**kwargs) Dict[source]
classmethod get_config_parameters() Dict[source]

Returns configuration parameters.

Returns:

Dictionary of configuration parameters. Keys are the parameter names, values are the descriptions.

get_details(id: Dict) Dict[source]

Returns standard details of the specified dataset.

Details dictionary:
  • title (str): Title

  • url (str): URL address

  • doi (str): DOI

  • status (str): Status

  • size (int): Total size of data files in bytes (optional)

  • created (datetime.datetime): Creation date and time

  • modified (datetime.datetime): Last modification date and time

Possible statuses are as follows:
  • “draft”: Dataset is not published yet.

  • “public”: Dataset is published and is publicly available.

  • “embargoed”: Dataset is published, but is under embargo.

  • “restricted”: Dataset is published, but accessible only under certain conditions.

  • “closed”: Dataset is published, but accessible only by the owners.

  • “error”: Dataset is in an error state.

  • “unknown”: Dataset is in an unknown state.

Parameters:

id (Dict) – Standard dataset id.

Returns:

Details dictionary of the dataset.

get_files(id: Dict) List[RemoteFile][source]

Retrieves list of files of the specified dataset.

Parameters:

id (Dict) – Standard dataset identifier.

Returns:

List of dataset files (RemoteFile).

Raises:
  • ValueError("Operation not permitted") – If files are restricted.

  • ValueError("Invalid dataset id") – If invalid dataset identifier.

save_metadata(id: Dict, metadata: Metadata) None[source]

Saves metadata of the specified dataset

Parameters:
  • id (Dict) – Standard dataset id

  • metadata (Metadata) – Metadata to be saved

Raises:

ValueError("No access token")

classmethod supports_folder() bool[source]

Returns if folders are supported.

validate_metadata(metadata: Metadata) Dict[source]

Validates metadata

Parameters:

metadata (Metadata) – Metadata to be validated

Returns:

Dictionary of invalid metadata fields and related error messages, if any.

Module contents

class fairly.client.Client(repository_id: str = None, **kwargs)[source]

Bases: ABC

config

Configuration options

Type:

Dict

_session

HTTP session object

Type:

Session

_datasets

Public dataset cache

Type:

Dict

_account_datasets

Account dataset cache

Type:

List

Class Attributes:

REGEXP_URL: Regular expression to validate URL address. REQUEST_FORMAT: Request data format CHUNK_SIZE: Chunk size in bytes to transfer data (default = 65536)

CHUNK_SIZE = 262144
REGEXP_URL = re.compile('(http(s)?):\\/\\/(www\\.)?[a-z\\d@:%._\\+~#=-]{2,256}\\.[a-z]{2,6}\\b([-a-z\\d@:%_\\+.~#?&//=]*)', re.IGNORECASE)
REQUEST_FORMAT = 'json'
property client_id: str

Client identifier

create_dataset(metadata=None) RemoteDataset[source]

Creates a dataset with the specified metadata

Parameters:

metadata – Metadata of the dataset (optional)

Returns:

Dataset

Raises:
  • ValueError("Invalid metadata")

  • ValueError("Invalid metadata", validation_result)

delete_dataset(id, **kwargs) None[source]

Deletes dataset from the repository.

Parameters:
  • id – Dataset identifier.

  • **kwargs – Other identifier arguments.

delete_file(dataset, file) None[source]
download_file(file: RemoteFile, path: str = None, name: str = None, notify: Callable = None) LocalFile[source]

Downloads a remote file.

Parameters:
  • file (RemoteFile) – Remote file.

  • path (str) – Local path of the file (optional).

  • name (str) – Local name of the file (optional).

  • notify (Callable) – Notification callback method (optional).

Returns:

Local file object.

Raises:
  • ValueError("No URL address") – If remote file has no URL address.

  • IOError("Invalid MD5 checksum") – If MD5 checksum is invalid.

get_account_datasets(refresh: bool = False) List[RemoteDataset][source]
classmethod get_config(**kwargs) Dict[source]
classmethod get_config_parameters() Dict[source]

Returns configuration parameters.

Returns:

Dictionary of configuration parameters. Keys are the parameter names, values are the descriptions.

get_dataset(id=None, refresh: bool = False, **kwargs) RemoteDataset[source]
get_dataset_id(id=None, **kwargs) Dict[source]

Returns standard dataset identifier

Parameters:
  • id – Dataset identifier

  • **kwargs – Other identifier arguments

Returns:

Standard dataset identifier

get_dataset_plain_id(id: Dict) str[source]

Returns plain standard dataset identifier.

Parameters:

id (Dict) – Standard dataset identifier.

Returns:

Plain standard dataset identifier.

abstract get_details(id: Dict) Dict[source]

Returns standard details of the specified dataset.

Details dictionary:
  • title (str): Title

  • url (str): URL address

  • doi (str): DOI

  • status (str): Status

  • size (int): Total size of data files in bytes

  • created (datetime.datetime): Creation date and time

  • modified (datetime.datetime): Last modification date and time

Possible statuses are as follows:
  • “draft”: Dataset is not published yet.

  • “public”: Dataset is published and is publicly available.

  • “embargoed”: Dataset is published, but is under embargo.

  • “restricted”: Dataset is published, but accessible only under certain conditions.

  • “closed”: Dataset is published, but accessible only by the owners.

  • “error”: Dataset is in an error state.

  • “unknown”: Dataset is in an unknown state.

Parameters:

id (Dict) – Standard dataset id

Returns:

Details dictionary of the dataset.

abstract get_files(id: Dict) List[RemoteFile][source]
get_metadata(id: Dict) Metadata[source]

Returns standard metadata of the specified dataset

Parameters:

id (Dict) – Standard dataset id

Returns:

Standard metadata

get_versions(id, refresh: bool = False, **kwargs) List[RemoteDataset][source]

Returns datasets of all available versions of the specified dataset

Parameters:
  • id – Dataset identifier

  • refresh (bool) – Set True to refresh versions (default = False)

Returns:

List of datasets of all available versions

classmethod normalize_value(name: str, val) Any[source]

Normalized metadata attribute value

Parameters:
  • name (str) – Attribute name

  • val – Attribute value

Returns:

Normalized attribute value

classmethod parse_id(id: str)[source]

Parses the specified identifier.

Parameters:

id – Identifier.

Returns:

Tuple of identifier type and value.

property repository_id: str

Repository identifier of the client

save_config(save_environment=False) None[source]

Saves client configuration.

Parameters:

save_environment – Set True to save environment variables (default False)

abstract save_metadata(id: Dict, metadata: Metadata) None[source]

Saves metadata of the specified dataset.

Parameters:
  • id (Dict) – Standard dataset id.

  • metadata (Metadata) – Metadata to be saved.

abstract classmethod supports_folder() bool[source]

Returns if folders are supported.

upload_file(dataset, file, notify: Callable = None) RemoteFile[source]
abstract validate_metadata(metadata: Metadata) Dict[source]

Validates metadata

Parameters:

metadata (Metadata) – Metadata to be validated

Returns:

Dictionary of invalid metadata fields and related error messages, if any.