fairly.dataset package
Submodules
fairly.dataset.local module
- class fairly.dataset.local.LocalDataset(path: str, auto_refresh: bool = True)[source]
Bases:
Dataset
- _path
Path of the dataset
- Type:
str
- _manifest_path
Path of the dataset manifest
- Type:
str
- _includes
File inclusion rules
- Type:
set
- _excludes
File exclusion rules
- Type:
set
- _md5s
MD5 checksum cache of the files
- Type:
Dict
- _yaml
YAML object
- Class Attributes:
_regexps (Dict): Regular expression cache of the file rules
- property created: datetime
Creation date and time of the dataset
- property excludes: Set
Exclusion rules of the dataset files
- get_remote_dataset(remote=None) RemoteDataset [source]
- property includes: Set
Inclusion rules of the dataset files
- property modified: datetime
Last modification date and time of the dataset
- property path: str
Path of the dataset
- pull(source=None, notify: Callable = None) RemoteDataset [source]
Pulls changes made to metadata and files from the data repository to update the local dataset. Dataset must exits in data repository.
- Parameters:
source – Source repository identifier or client. If not specified,
used. (identifier in manifest is)
notify (Callable) – Notification callback function.
- Returns:
Remote dataset
- Raises:
ValueError("No source dataset") – If source dataset is not specified.
- push(target=None, notify: Callable = None) RemoteDataset [source]
Pushes local changes to metadata and files the data repository to update a remote dataset. Dataset must exits in data repository.
- Parameters:
target – Target repository identifier or client. If not specified,
used. (identifier in manifest is)
notify (Callable) – Notification callback function.
- Returns:
Remote dataset
- Raises:
ValueError("No target dataset") – If target dataset is not specified.
- property remote_datasets: Dict
Known remote datasets of the dataset.
- reproduce() LocalDataset [source]
Reproduces an actual copy of the dataset.
- save_files(force: bool = False) None [source]
Stores dataset file list if exists.
- Parameters:
force (bool) – Set True to enforce save even if existing dataset is modified
- Raises:
Warning("Existing dataset is modified") –
- property size: int
Total size of the dataset in bytes.
- property template: str
Metadata template of the dataset
- property title: str
Title of the dataset.
- upload(repository=None, notify: Callable = None, strategy: str = 'auto', force: bool = False) RemoteDataset [source]
Uploads dataset to the repository.
- Available upload strategies:
auto: Mirror if folders are supported, otherwise archive folders individually.
mirror: Upload files and folders as they are.
archive_all: Create a single archive file for all files and folders.
archive_folders: Create an individual archive file for each folder.
- Parameters:
repository – Repository identifier or client. If not specified, template identifier is used.
notify (Callable) – Notification callback function.
strategy (str) – Folder upload strategy (default = “auto”)
force (bool) – Set True to upload dataset even if a remote version exists (default = False)
- Returns:
Remote dataset
- Raises:
ValueError("Invalid repository") – If repository argument is invalid.
ValueError("Invalid upload strategy") – If upload strategy is invalid.
ValueError("Invalid archiving method") – If archiving method is invalid.
ValueError("Invalid archive name") – If archive name is invalid.
Warning("Remote dataset exists") – If remote dataset exists.
fairly.dataset.remote module
- class fairly.dataset.remote.RemoteDataset(client, id=None, auto_refresh: bool = True, **kwargs)[source]
Bases:
Dataset
- _client
Client object
- Type:
Client
- _id
Dataset identifier
- Type:
str
- _details
Dataset details
- Type:
Dict
- property client: Client
Client of the dataset.
- property created: datetime
Creation date and time of the dataset
- property doi: str
DOI of the dataset.
- get_versions() List[RemoteDataset] [source]
Returns all available versions of the dataset.
- Returns:
List of remote datasets of all available versions.
- property id: Dict
Identifier of the dataset.
- property modified: datetime
Last modification date and time of the dataset
- property plain_id: str
Plain identifier of the dataset.
- reproduce() RemoteDataset [source]
Reproduces an actual copy of the dataset.
- property size: int
Total size of the dataset in bytes.
- property status: str
Status of the dataset.
- Possible statuses are as follows:
“draft”: Dataset is not published yet.
“public”: Dataset is published and is publicly available.
“embargoed”: Dataset is published, but is under embargo.
“restricted”: Dataset is published, but accessible only under certain conditions.
“closed”: Dataset is published, but accessible only by the owners.
“error”: Dataset is in an error state.
“unknown”: Dataset is in an unknown state.
- store(path: str = None, notify: Callable = None, extract: bool = False, max_workers: int = None) LocalDataset [source]
Stores the dataset to a local directory.
If no path is provided, DOI is used by replacing slashes and backslashes with underscores. Local directory is created if it does not exist.
- Parameters:
path (str) – Path to the local directory (optional).
notify (Callable) – Notification callback method (optional).
extract (bool) – Set True to extract archive files (default False).
max_workers (int) – Number of workers (optional).
- Returns:
LocalDataset object of the stored local dataset.
- Raises:
ValueError("Empty path") –
ValueError("Directory is not empty") –
- property title: str
Title of the dataset.
- property url: str
URL address of the dataset.
Module contents
Dataset class module.
Dataset class is used to represent datasets in a standardized manner. It is an abstract class.
- Implementations:
LocalDataset RemoteDataset
- class fairly.dataset.Dataset(auto_refresh: bool = False)[source]
Bases:
ABC
Dataset class.
- _metadata
Metadata.
- Type:
- _files
Files list.
- Type:
list
- _modified
Last known modification date.
- Type:
datetime.datetime
- _auto_refresh
Auto-refresh flag.
- Type:
bool
- property auto_refresh: bool
Auto-refresh flag of the dataset.
- abstract property created: datetime
Creation date and time of the dataset.
- file(val: str) File [source]
Returns specified file of the dataset.
Automatically refreshes file information if dataset is modified.
- property files: List[File]
List of files of the dataset.
- get_file(val: str, refresh: bool = False) File [source]
Returns specified file of the dataset.
- Parameters:
val (str) – File identifier.
refresh (bool) – Set True to enforce file information retrieval.
- Returns:
File object if file is found, None otherwise.
- get_files(refresh: bool = False) Dict[str, File] [source]
Returns dictionary of files of the dataset.
- Parameters:
refresh (bool) – Set True to enforce file list retrieval.
- Returns:
Dictionary of files of the dataset. Keys are paths, values are File objects.
- get_metadata(refresh: bool = False) Metadata [source]
Returns metadata of the dataset.
- Parameters:
refresh (bool) – Set True to enforce metadata retrieval (default False).
- Returns:
Metadata of the dataset.
- property is_modified: bool
Checks if the existing dataset is modified.
- Returns:
True if the existing dataset is modified, False otherwise.
- property metadata: Metadata
Metadata of the dataset.
Refreshes metadata automatically if metadata object is not modified by the user, auto-fresh flag is set, and metadata is modified externally.
- abstract property modified: datetime
Last modification date and time of the dataset.
- abstract reproduce() Dataset [source]
Reproduces an actual copy of the dataset.
- save_metadata(force: bool = False) None [source]
Stores dataset metadata if exists.
- Parameters:
force (bool) – Set True to enforce save even if existing dataset is modified (default False).
- Raises:
Warning("Existing dataset is modified") – If dataset is modified.
- set_metadata(**kwargs) None [source]
Sets metadata attributes.
- Parameters:
**kwargs – Metadata attributes.
- abstract property size: int
Total size of the dataset in bytes.
- abstract property title: str
Title of the dataset.