Code Documentation

The package filehooks implements the main functionality for the backend of the CodeAbility Sharing Platform. Since relative imports do not work without any issues, this module has to be installed in GitLab.

This package can be installed with pip using the following command.

pip3 install .

Note

When installing this package manually the api token for GitLab, email username and email password have to be set in the conf.production.ini before the installation!

filehooks module

This module can be installed within GitLab to handle events in its projects. Its purpose is checking for the presence of metadata describing the repository’s content and putting valid metadata into an elasticsearch index.

class filehooks.AnalysedCommit(project: gitlab.v4.objects.Project, branch_name: str, commit_hash: str, errors: List[filehooks.MetadataInfo])[source]

Bases: object

Contains information about the commit which was analysed, including the result of the analysis.

branch_name: str
commit_hash: str
errors: List[filehooks.MetadataInfo]
project: gitlab.v4.objects.Project
class filehooks.ConfigType(value)[source]

Bases: enum.Enum

Enum for choosing the desired configuration

DEBUG = 2
LOCAL = 4
PRODUCTION = 0
STAGING = 3
TEST = 1
class filehooks.ErrorMessage(project_info: filehooks.ProjectInfo, metadata_info: List[filehooks.MetadataInfo], footer_msg: str = '', help_msg: str = '')[source]

Bases: object

Collection of all invalid metadata files for a project

create_html() → str[source]

Returns the error message as HTML

Returns:

The HTML

create_plain() → str[source]

Returns the error message as plain text

Returns:

The plain text

footer_msg: str = ''
help_msg: str = ''
metadata_info: List[filehooks.MetadataInfo]
project_info: filehooks.ProjectInfo
class filehooks.EventHandler(gitlab_instance: gitlab.Gitlab, mail: filehooks.Mail, elasticsearch_instance: elasticsearch.Elasticsearch, validation_service: filehooks.ValidationService, git_event: Dict[str, Any])[source]

Bases: object

Class for handling GitLab events

elasticsearch_instance: elasticsearch.Elasticsearch
git_event: Dict[str, Any]
gitlab_instance: gitlab.Gitlab
handle_event() → None[source]

Calls the appropriate function to handle the GitLab system hook events push, project_rename, project_transfer, project_destroy, and group_rename. (https://docs.gitlab.com/ee/system_hooks/system_hooks.html)

Returns:

None

handle_group_rename_event() → None[source]

Handles the ‘group_rename’ event.

Returns:

None

handle_project_destroy_event() → None[source]

Handles the ‘project_destroy’ event.

Returns:

None

handle_project_rename_event() → None[source]

Handles the ‘project_rename’ event.

Returns:

None

handle_project_transfer_event() → None[source]

Handles the ‘project_transfer’ event.

Returns:

None

handle_push_event() → None[source]

Handles the ‘push’ event.

Returns:

None

handle_team_change_event() → None[source]

Handles the user_add_to/remove_from_team event.

Returns:

None

mail: filehooks.Mail
validation_service: filehooks.ValidationService
class filehooks.HealthCheck(gitlab_instance: gitlab.Gitlab, validation_service: filehooks.ValidationService, mail: filehooks.Mail)[source]

Bases: object

Handles validation of a project’s metadata and warns users when problems with the metadata are found.

gitlab: gitlab.Gitlab
mail: filehooks.Mail
send_validation_error_mail(git_event: Dict[str, Any], analysed_commit: filehooks.AnalysedCommit) → None[source]

Sends an email to the user who triggered the validation warning about errors. Only call this method if there were errors, if the validation did not find any issues no email should be sent

Parameters:
  • git_event – the event which triggered the validation

  • analysed_commit – information about the commit which was analysed

validate_project(gitlab_instance: gitlab.Gitlab, project: gitlab.v4.objects.Project, commit: str) → Tuple[List[filehooks.MetadataInfo], Optional[filehooks.Node]][source]

Checks if the project’s metadata is defined correctly. This requires exactly one top-level metadata file, recognised by naming convention. While parsing the metadata, errors are collected in a list. Valid metadata file names are stored as a tree, resembling either an atomic project or a collection.

Parameters:
  • project – the gitlab project to check

  • commit – the hash of the commit to analyze

Returns:

List of validation errors, Tree of metadata (single node if repo is not a collection)

validation_service: filehooks.ValidationService
class filehooks.Indexing(gitlab_instance: gitlab.Gitlab, mail: filehooks.Mail, elasticsearch_instance: elasticsearch.Elasticsearch, index_name: str = 'metadata')[source]

Bases: object

Contains the functionality required to create and update entries in the metadata index.

add_alias(alias: str, indexes: List[str]) → None[source]

Adds an alias for a list of elasticsearch indexes.

Parameters:
  • alias – The alias to be set

  • indexes – The list of indexes to set the alias

Returns:

None

change_main_indexes(indexes_metadata: Tuple[str, str]) → None[source]

Changes the main indexes for the metadata index by removing the alias from the old index and adding the alias the new index.

Parameters:

indexes_metadata – A pair (old, new) of metadata indexes

Returns:

create_new_index() → None[source]

Creates a new index for metadata information. Afterwards the entire main group (sharing-group) is indexed.

Returns:

None

delete_indexes(indexes: List[str]) → None[source]

Deletes a list of elasticsearch indexes :param indexes: :return: None

elasticsearch: elasticsearch.Elasticsearch
get_alias(alias: str) → Any[source]

Tries to get the elasticsearch alias specified by the argument.

Parameters:

alias – the name of an alias

Returns:

the alias if it exists, None otherwise

get_all_gitlab_users() → list[source]

Returns a list of all users.

Returns:

list of users.

get_all_index_names() → List[str][source]

Returns the names of all indices in the elasticsearch instance.

get_all_projects(group_id: int) → List[gitlab.v4.objects.Project][source]

Returns all projects of a group including projects from subgroups.

Parameters:

group_id – ID of the group

Returns:

List of all projects

get_main_group_id() → int[source]

Returns the group ID of the main group.

Returns:

ID of the group

get_projects_of_user(user: gitlab.v4.objects.User) → list[source]

returns the list of projects for this user :return: list of user projects

get_root_groups() → list[source]

Returns the group ID of the main group.

Returns:

ID of the group

gitlab_instance: gitlab.Gitlab
group_rename(group_id: int) → None[source]

Handles a group_rename event. Renames all projects of the given group (including subgroups).

Parameters:

group_id – ID of the group to rename

Returns:

None

index_all_metadata(project_permissions: filehooks.ProjectPermissions, metadata_tree: filehooks.Node) → None[source]

Index the metadata of the project.

Parameters:
  • project_permissions – contains information about who may access the project

  • metadata_tree – a tree of valid metadata file paths

Returns:

None

index_all_projects(get_relevant_projects: Callable[], List[gitlab.v4.objects.Project]], logger_prefix: str = '') → None[source]

Indexes the files of an entire group

Parameters:

get_relevant_projects – method to generate relevant projects

:param logger_prefix just to add relevant info to logString :return: None

index_entire_repository(metadata_tree: filehooks.Node) → None[source]

Indexes the files of an entire project

Parameters:

metadata_tree – a tree of valid metadata file paths

Returns:

None

index_metadata_node(project_permissions: filehooks.ProjectPermissions, metadata_tree: filehooks.Node, parent_metadata: Dict[str, Any], parent_id: Optional[str]) → None[source]

Recursive function for indexing the metadata tree of a project.

Parameters:
  • project_permissions – contains information about who may access the project

  • metadata_tree – tree of valid metadata file paths

  • parent_metadata – the parent’s metadata read from the file, used for inheritance

  • parent_id – the id of the parent metadata

Returns:

None

index_name: str
mail: filehooks.Mail
print_all_aliases() → None[source]

Prints information of all elasticsearch aliases on stdout

Returns:

None

print_all_indexes() → None[source]

Prints information of all elasticsearch indices on stdout

Returns:

None

project_destroy(project_id: int) → None[source]

Handles a project_destroy event. Deletes all elements in the index for the given project_id.

Parameters:

project_id – ID of the project

Returns:

None

project_rename(project_id: int, path: str, path_with_namespace: str, url: str) → None[source]

Handles a project_rename event. Updates project_name, namespace, main_group and sub_group for all elements of the given project_id in the metadata index

Parameters:
  • project_id – ID of the project

  • path – the path of the repository (project name)

  • path_with_namespace – repository path with namespace

  • url – GITLAB_URL to the repository

Returns:

None

web_url_project(project_id: int) → str[source]

Returns the web url of a given project id after querying GitLab.

Parameters:

project_id – ID of the project

Returns:

The Url to the project

class filehooks.ItemPath(path: str, metadata_file: str, commit: str, project_id: int, gitlab_instance: gitlab.Gitlab)[source]

Bases: object

Represents a link to another Item (either relative to this path, to another project, or even to another repository

commit: str

git commit id

create_children_itempath(path: str)filehooks.ItemPath[source]

creates a new itempath relative to self :param path: an id

doc_id() → str[source]

constructs the doc_id for this item path

get_full_path() → str[source]

returns the full file path

get_metadata_file() → str[source]

returns the base name of the metadata_file, (relative to path)

get_path() → str[source]

returns the path

get_project_id() → int[source]

returns the project id

gitlab_instance: gitlab.Gitlab

a gitlab Instance

gitlab_project: gitlab.v4.objects.Project

the cached gitlab project

metadata_file: str

meta data file name

path: str

directory path either relative to some other Item path, or absolute to root of object

project_id: int

gitlab project id

class filehooks.Mail(username: str, password: str, address: str, mail_from: str, host: str, port: int, homepage: str)[source]

Bases: object

Class for sending e-mails about validation errors.

address: str
classmethod from_dict(config: Dict[str, Any]) → Any[source]

Creates an instance of Mail from the given config.

Parameters:

config – configuration to create the class from

Returns:

the created Mail instance

homepage: str
host: str
mail_from: str
password: str
port: int
send_mail(mail_to: str, mail_subject: str, mail_html: str, mail_plain: str) → None[source]

Sends an e-mail via an SMTP server

Parameters:
  • mail_to – Mail recipient

  • mail_subject – Mail subject

  • mail_html – Mail body as HTML

  • mail_plain – Mail body as plain text

Returns:

None

username: str
class filehooks.MetadataFileIndexer(elasticsearch: elasticsearch.Elasticsearch, index: str, item: filehooks.ItemPath, project_permissions: filehooks.ProjectPermissions, parent_metadata: Dict[str, Any], parent_id: Optional[str], child_ids: List[str])[source]

Bases: object

File indexer used to handle indexing of a metadata file (metadata.json, metadata.yml, or metadata.yaml) in elasticsearch :param elasticsearch the elasticsearch interfaces :param index the name of the index :param item path to the item :param project_permissions (read) permissions on this project :param parent_metadata the meta data of the parent object (in order to get it inherited :param parent_id the id of the parent :param child_ids a collection for child ids

child_ids: List[str]
commit: str
create_json_doc() → Dict[str, Any][source]

Returns an indexable json document for elasticsearch.

Returns:

The document which should be added/updated to/in elasticsearch

elasticsearch: elasticsearch.Elasticsearch
file_path: str
get_file() → gitlab.v4.objects.ProjectFile[source]

Returns a single file of a commit.

Returns:

the file

static get_file_contents(file: gitlab.v4.objects.ProjectFile) → bytes[source]

Returns the file contents.

Parameters:

file – file to extract the file content

Returns:

The file content

get_user_provided_metadata() → Dict[str, Any][source]

Prepares the user provided part of the metadata which is to be stored in the metadata index. Also handles inheritance.

Returns:

the user provided metadata which should be put in the index for the current file

index: str
item: filehooks.ItemPath
parent_id: Optional[str]
parent_metadata: Dict[str, Any]
project: gitlab.v4.objects.Project
project_info() → Dict[str, Any][source]

Returns information about the project.

Returns:

The project information

project_info_metadata() → Dict[str, Any][source]

Returns a dictionary containing the data which should be put as the value of the ‘project’ key in the metadata index

Returns:

dictionary containing the project metadata

project_permissions: filehooks.ProjectPermissions
update_doc() → None[source]

Updates an existing document in elasticsearch or creates a new one if no document with this id exists.

Returns:

None

class filehooks.MetadataInfo(url: Optional[str], filename: Optional[str], errors: List[str] = <factory>, warnings: List[str] = <factory>)[source]

Bases: object

Represents a metadata file with errors

errors: List[str]
filename: Optional[str]
url: Optional[str]
warnings: List[str]
exception filehooks.NoBranchToIndexError[source]

Bases: Exception

Error raised when a project has no branch with a name appearing in BRANCH_PRIORITY.

class filehooks.Node(item: filehooks.ItemPath)[source]

Bases: object

A simple tree structure for representing a metadata collection

children: List[Any]
get_all_paths() → List[filehooks.ItemPath][source]

Returns a list of all Nodes’ paths reachable from this node (the tree having this node as the root flattened into a list).

Returns:

all Nodes’ paths reachable from this node

get_child_paths() → List[filehooks.ItemPath][source]

Returns a list containing paths of all direct child nodes of this node.

Returns:

the paths of all direct child nodes

item: filehooks.ItemPath
print_tree() → None[source]

Logs a visual representation of the tree having this node as the root. Siblings have the same indentation. A node’s children are the nodes logged directly below whose indentation level is one more than the parent node’s.

Returns:

None

print_tree_recursive(level: int) → None[source]

Used internally to recursively generate a visual representation of the tree.

Parameters:

level – distance from the root

Returns:

None

str_children() → str[source]

returns a readable string representation (nearly as print_tree) on one line

exception filehooks.PathError[source]

Bases: ValueError

Represents an error in a file path.

class filehooks.ProjectInfo(event: dataclasses.InitVar[typing.Dict[str, typing.Any]], branch_name: dataclasses.InitVar[str], commit: dataclasses.InitVar[ProjectCommit])[source]

Bases: object

Represents a project with erroneous metadata file(s)

branch: str
branch_name: dataclasses.InitVar[str]
branch_url: str
commit: dataclasses.InitVar[ProjectCommit]
commit_author: str
commit_id: str
commit_message: str
commit_url: str
event: dataclasses.InitVar[typing.Dict[str, typing.Any]]
repository: str
repository_url: str
urls: List[Tuple[str, str]]
user_avatar: str
class filehooks.ProjectPermissions(email_addresses_with_access: List[str], groups_with_access: List[str])[source]

Bases: object

Data class for storing who has read access to a project.

email_addresses_with_access: List[str]
groups_with_access: List[str]
class filehooks.ValidationResult[source]

Bases: object

type that is returned by the validation service

errors: List[str]
is_empty() → bool[source]
warnings: List[str]
class filehooks.ValidationService(rest_url: str = 'http://localhost:8080/api/validateMetaData')[source]

Bases: object

Handles the validation of projectsÄ metadata and collects errors and warnings

custom_validation_result_decoder(resultDict)filehooks.ValidationResult[source]
rest_url: str = 'http://localhost:8080/api/validateMetaData'
validate_metadata(metadata: dict, top_level: bool)filehooks.ValidationResult[source]
filehooks.calculate_all_groups_members(groups: List[gitlab.v4.objects.Group], gitlab_instance: gitlab.Gitlab) → List[str][source]
filehooks.calculate_group_members(group: gitlab.v4.objects.Group) → List[str][source]
filehooks.calculate_project_members(project: gitlab.v4.objects.Project, gitlab_instance: gitlab.Gitlab) → Tuple[List[str], List[str]][source]

Returns a list of user emails allowed to read the project and a list of group names with read access to the project.

filehooks.check_and_index_project(project_id: int, gitlab_instance: gitlab.Gitlab, validation_service: filehooks.ValidationService, mail: filehooks.Mail, elasticsearch_instance: elasticsearch.Elasticsearch, metadata_mandatory: bool) → Optional[filehooks.AnalysedCommit][source]

Validates and indexes the project with the given id. Does not send error notifications to the user, but information about the analysed commit, including a (potentially empty) list of validation errors.

Parameters:
  • project_id – the id of the project to index

  • gitlab_instance – the GitLab instance

  • mail – Mail object for sending error messages

  • elasticsearch_instance – the elasticsearch instance

Returns:

information about the analysed commit or None if no suitable commit can be obtained

filehooks.check_for_mandatory_fields_on_toplevel(file_content: Dict[str, int]) → Iterator[jsonschema.exceptions.ValidationError][source]

checks the file_content for entries that are mandatory on the top level

filehooks.check_for_single_metadata_files(metadata_files: List[str]) → List[filehooks.MetadataInfo][source]

Checks if there is exactly one metadata file.

Parameters:

metadata_files – List of metadata files

Returns:

List of errors

filehooks.check_if_file_exists(path: filehooks.ItemPath) → Tuple[bool, str][source]

Checks if a file specified by path exists.

Parameters:

path – a normalized path

Returns:

True if the file exists, False otherwise

filehooks.deduplicate_paths(paths: List[filehooks.ItemPath]) → Tuple[List[filehooks.ItemPath], List[filehooks.ItemPath]][source]

Takes a list of paths and checks for duplicates. If duplicates are found, they are returned in the first element of the return tuple. The second element of the return tuple contains the de-duplicated list of paths. The order of the paths will be preserved. For duplicates, the first element will be kept.

Parameters:

paths – a list of normalized paths, potentially containing duplicates.

Returns:

tuple consisting of a list of duplicate paths and the de-duplicated list.

filehooks.generate_duplicate_path_warning(path: filehooks.ItemPath) → str[source]

Generates a message warning about a duplicate path in a metadata file’s collectionContent.

Parameters:

path – the path appearing more than once

Returns:

a message warning about a duplicate path in a metadata file’s collectionContent

filehooks.get_branch_to_index(project: gitlab.v4.objects.Project) → str[source]

Returns the name of the branch which should be indexed, following the priority given by BRANCH_PRIORITY. Raises NoBranchToIndexError if no branch with a name appearing in BRANCH_PRIORITY exists.

Parameters:

project – The project whose branches should be checked

Returns:

The name of the branch to index

Raises:

NoBranchToIndexError – when no branch with a name in BRANCH_PRIORITY exists

filehooks.get_commit_to_index(project_id: int, gitlab_instance: gitlab.Gitlab) → Tuple[gitlab.v4.objects.Project, str, str][source]

Tries to obtain the project, branch name and commit of the project which should be indexed. Raises a NoBranchToIndexError if no suitable branch is found.

Parameters:
  • project_id – the id of the GitLab project

  • gitlab_instance – the GitLab instance

Returns:

project, branch name, commit hash

Raises:

NoBranchToIndexError – if no suitable branch exists

filehooks.get_indexing_commit(project: gitlab.v4.objects.Project) → Tuple[str, str][source]

Tries to obtain the branch name and commit of the project which should be indexed.

Parameters:

project – the project which should be indexed

Returns:

branch name, commit hash

Raises:

NoBranchToIndexError – if no suitable branch exists

filehooks.get_project_for_id(project_id: int, gitlab_instance: gitlab.Gitlab) → gitlab.v4.objects.Project[source]

Tries to obtain the project from git.

Parameters:
  • project_id – the id of the GitLab project

  • gitlab_instance – the GitLab instance

filehooks.get_relevant_commit_hash(project: gitlab.v4.objects.Project, branch_name: str) → str[source]

Gets the hash of the latest commit on the branch to index for the given project.

Parameters:
  • project – the project to get the hash for

  • branch_name – name of the branch of which the latest commit should be shown

Returns:

the hash of the latest commit on the branch to index.

filehooks.get_repository_metadata_files(project: gitlab.v4.objects.Project, commit: str) → List[str][source]

Returns all metadata files in the repository’s root.

Parameters:
  • project – the project instance

  • commit – the hash of the commit to analyze

Returns:

List of metadata files

filehooks.in_main_group(path_with_namespace: str) → bool[source]

Checks if the root of the given namespace is in the main group

Parameters:

path_with_namespace – path to the repository to be checked

Returns:

True, if in main_group

filehooks.is_regular_file(mode: str) → bool[source]

Returns true if the file mode corresponds to a regular file and false otherwise.

Parameters:

mode – file mode

Returns:

True, if regular file.

filehooks.load_config(config_type: filehooks.ConfigType = <ConfigType.PRODUCTION: 0>) → Tuple[gitlab.Gitlab, filehooks.Mail, elasticsearch.Elasticsearch][source]

Parses the configuration in filehooks/config.ini. This file only exists when deployed in the GitLab container and is a copy of one of the files in filehooks/conf/. Which file is used depends on the configuration when setting up GitLab.

filehooks.logger_setup(filepath: str) → Dict[str, Any][source]

Returns a dictionary which can be used to configure a logger.

Parameters:

filepath – path of the log file

Returns:

a dictionary to configure a logger

filehooks.normalize_collection_content_paths(collection_content: List[str], parent: filehooks.ItemPath) → Tuple[List[str], List[filehooks.ItemPath]][source]

Takes the “collectionContent” list from a metadata file, normalizes the paths, and warns about issues, such as duplicate paths.

Parameters:
  • collection_content – the “collectionContent” list from a metadata file

  • parent – normalized path of the metadata file from which “collectionContent” is taken

Returns:

tuple consisting of a list of error messages and a list of normalized file paths without duplicates

filehooks.normalize_path(child_path: str, parent: filehooks.ItemPath, parent_path: Optional[str] = None) → str[source]

Takes a potentially un-normalized child path and normalizes it, checking for potential errors. if the path can be normalized and is valid, the normalized path is returned as a str. Otherwise, a PathError is raised.

Parameters:
  • child_path – potentially un-normalized path from the parent’s “collectionContent”

  • parent – the parent of this child

  • parent_path – normalized path of a metadata file

Returns:

the normalized path

Raises:

PathError – when the path cannot be normalized or is invalid

filehooks.parse_child_path(child_path: str) → Tuple[str, int, str][source]

parses a child_path and returns the git url, the project id, and the relative path

filehooks.parse_metadata_file(file_contents: bytes, extension: str) → Dict[str, Any][source]

Given the base64 encoded content of a file and the extension of the file, this function attempts to parse the content according to the format specified by the extension. If the extension is ‘.json’ it tries to parse it as a json file, otherwise the data is treated as YAML.

Parameters:
  • file_contents – base64 encoded content of the file to parse

  • extension – the file extension, including a leading dot. One of {‘.json’, ‘.yaml’, ‘.yml’}

Returns:

If successful, a dictionary with the parsed key-value pairs

Raises:

ValueError – if the file contents cannot be parsed as a dictionary

filehooks.path_2_key(validation_error: jsonschema.exceptions.ValidationError) → str[source]

Returns the path to the key for which a validation error occurred.

Parameters:

validation_error – Validation error

Returns:

Path to validation error

filehooks.read_gitlab_event() → Optional[Dict[str, Any]][source]

Reads the GitLab system hook event from stdin.

Returns:

The event

filehooks.unify_keywords(file_contents: Dict[str, Any]) → Dict[str, Any][source]

Given a dictionary representing the user provided metadata, this function normalizes some keywords. Currently the only thing it does is converting the programming language names to titlecase.

Parameters:

file_contents – dictionary representing the user provided metadata

Returns:

the input dictionary with normalized keywords

filehooks.validate_collection(gitlab_instance: gitlab.Gitlab, parent: filehooks.ItemPath, collection_content: List[str], schema: Any, nesting: int, visited_parents: List[filehooks.ItemPath] = []) → Tuple[List[str], List[filehooks.MetadataInfo], List[filehooks.Node]][source]

Parses a metadata file’s collectionContent. Returns a tuple consisting of a list of errors in the parent’s collectionContent specification, a list of errors obtained recursively from the children, and a list of child nodes.

Parameters:
  • parent – ItemPath to current meta data

  • collection_content – the metadata file’s collectionContent (list of paths)

  • schema – a dictionary which can be used by jsonschema Draft 7

  • nesting – nesting level of children (0 is top most)

Returns:

a tuple consisting of a list of errors in the parent’s collectionContent specification,

a list of errors obtained recursively from the children, and a list of child nodes

filehooks.validate_metadata_file(gitlab_instance: gitlab.Gitlab, validation_service: filehooks.ValidationService, metadata_file_path: filehooks.ItemPath, schema: Any, nesting: int, visited_parents: List[filehooks.ItemPath] = []) → Tuple[List[filehooks.MetadataInfo], Optional[filehooks.Node]][source]

Validates a single metadata file. A file is valid iff it can be parsed in the format specified by its extension (JSON or YAML) and the content conforms to the given schema.

Parameters:
  • metadata_file_path – the file to validate

  • schema – a dictionary which can be used by jsonschema Draft 7

  • nesting – nesting level of children (0 is top most)

Returns:

Information about the metadata file, including a potentially empty list of errors.

scripts

Collection of all scripts which can be installed in GitLab to extend its functionality.

Note

Those scripts assume that the package filehooks is installed!

Tests

Tests are written using the Python testing framework pytest.

Unit tests

Unit tests can be found in tests/filehooks and can be executed manually by the following command:

pytest --cov-report term-missing --cov=filehooks/  tests/filehooks

After each alteration of the code, it should be ensured that a code coverage of 100% is available. On every push event, unit tests are automatically executed by GitLab CI/CD.

Integration tests

Since the code dependents heavily on the GitLab behavior, which was mocked in the unit tests, integration tests are provided to check that the entire system works as expected.

Note

The integration tests’ current implementation is not very robust, meaning that some tests may fail when executed too fast even though they would pass if waited long enough.

The integration tests are not executed automatically by GitLab CI/CD. Those tests can be executed locally by running ./run_integration_tests.sh.

In order to run the integration tests in isolation, they run in a dedicated set of containers. These are created automatically upon calling the script mentioned above, if they do not exist already. They have the same names as the containers which are created for production, but with the postfix _integration. Data used by these containers and the container running the integration tests is stored in /tmp/sharing/integration/ by default. This location can be configured in the run_integration_tests.sh script.

Note

Some tests use the configuration file filehooks/conf/conf.test.ini. Please ensure that the correct values are set before test execution.

Note

The containers use quite a lot of memory, so running the integration tests while the normal set of containers is running could be problematic on systems with too little memory/swap space.

Linter

The tools pylint and flake-8 are used for static code analysis. Experience shows that pylint is more strict and verbose. However, if flake-8 finds a potential issue, it is worth checking it out. Some default settings of the tools had to be adjusted. It should be ensured that the pylint-score always reaches 10 points. If a potential issue is fine the way it is, suppress it.

Automated code checks

This project uses git hooks to automatically check the code. Git hooks are scripts which are automatically executed upon certain git events. They can be installed on the client side (the developers machine) and on the server side (the git server which hosts the repository, e.g. a GitLab instance).

Installation

For development, a client side hooks is used to check code about to be committed. This is done by a pre-commit hook. Client side hooks need to be installed by the developer an their machine:

  1. Install the python package pre-commit, e.g. by running pip install pre-commit. For other installation methods, see https://pre-commit.com/#installation.

  2. Go to the root directory of the project and run pre-commit install.

Working with code checks in place

When these two steps succeeded, git will automatically run some checks on the code before every commit. If any check fails, the commit is aborted.

Some checks which auto-format files fail if they do any formatting. When this happens, the changes they do are not put into the git index and need to be added manually, e.g. by running git add <auto-formatted file> in order to include them in the commit.

Some IDEs which allow committing from within them might not do a good job at displaying the error messages. Running git commit from a terminal should give much better feedback, including colored messages and information about which check is running at the moment.

Checks in use

The pre commit hooks runs several checks. The detailed configuration is found in the pre-commit configuration file .pre-commit-config.yaml. These checks are run:

  • some generic hooks (pre-commit defaults)

  • isort sorts import statements (changes are not added to git index automatically)

  • black formats code (changes are not added to git index automatically)

  • mypy checks type annotations

  • pyright checks type annotations

  • flake8 linter

  • pylint linter

  • pytest runs unit tests

Please keep in mind that the integration tests are not run automatically. Since it takes several minutes to run them, it would not be reasonable to include them here.

Disabling checks

Sometimes it might be necessary to temporarily disable on of the checks. One way to do it is forcing the commit, which just skips all checks. However, this is discouraged, since most checks will probably be useful. For example, when pylint finds an issue in the code which you intend to fix in a later commit it still makes sense to auto-format the code. In such cases the SKIP environment variable can be used:

SKIP=pylint,flake8 git commit

sets this variable and initiates a commit. The variable takes a comma-separated list of checks to skip. The names of the tools can be found in the configuration file for pre-commit, .pre-commit-config.yaml. There looks for the values of id keys.

GitLab CI

Since the git-hook based checks need to be installed by each developer on their machine, it could happen that someone forgets to use them. GitLab continuous integration jobs are used to run most of the same checks in our GitLab instance. This requires a GitLab runner to be installed.

The configuration is found in .gitlab-ci.yml.

Each push to GitLab starts a pipeline which runs the configured tools. Linter failures are indicated as warnings, unit test failures as errors which abort the pipeline. It is possible to view the command line output of the tool in GitLab, and in some cases artifacts can be downloaded.

Commandline Interface

A command-line interface for elastic search index manipulation is provided. It supports the following operations:

  • List indexes (list-index)

  • Create index (create-index)

  • Delete index (delete-index)

  • Delete unused MetaData indices (delete-unused-indices)

  • List aliases (list-alias)

  • Switch aliases (switch-alias)

  • Reindex (reindex)

Workflow

  1. Start a shell in the sharing_gitlab docker container

    docker exec -it sharing_gitlab /bin/bash
    
  2. Navigate to utils (e.g., /file-hooks-src/utils)

  3. Run python3 cli.py with the appropriate arguments

    Note

    Running python3 cli.py with no arguments yields the help output.

Commands

list-index

  • Functionality: List all elasticsearch indexes

  • Usage:

    python3 cli.py list-index
    

create-index

  • Functionality: Creates new indexes for metadata information

  • Usage:

    python3 cli.py create-index \
        -idx-metadata IDX_METADATA
    
  • Arguments:

    • IDX_METADATA: Name of new index for metadata information

delete-index

  • Functionality: Deletes elasticsearch index

  • Usage:

    python3 cli.py delete-index IDX
    
  • Arguments:

    • IDX: Name of index to be deleted

delete-unused-indices

  • Functionality: Deletes currently unused metadata indices in elasticsearch

  • Usage:

    python3 cli.py delete-unused-indices
    

list-alias

  • Functionality: List all elasticsearch aliases

  • Usage:

    python3 cli.py list-alias
    

switch-alias

  • Functionality: Removes the alias for the old metadata index and adds a new alias for the new metadata index

  • Usage:

    python3 cli.py switch-alias \
        --old-idx-metadata OLD_IDX_METADATA \
        --new-idx-metadata NEW_IDX_METADATA
    
  • Arguments:

    • OLD_IDX_METADATA: Name of new index for metadata information

    • NEW_IDX_METADATA: Name of new index for metadata information

Note

  • Running python3 cli.py -h yields the help output.

  • Running python3 cli.py <command> -h yields the help for the specified command.

reindex

  • Functionality: Creates a new index for metadata, fills them and switches the alias.

  • Usage:

    python3 cli.py reindex
    

Example

Assuming you executed steps 1 and 2 from the workflow. Your workflow for reindexing might look like:

$ python3 cli.py list-index
list-index
health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   idx_metadata_0 KGQ0aWacREKM44aEYLWkNA   1   1          4            3     41.1kb         41.1kb


$ python3 cli.py list-alias
list-alias
alias    index          filter routing.index routing.search is_write_index
metadata idx_metadata_0 -      -             -              true


$ python3 cli.py create-index idx_metadata_1


$ python3 cli.py list-index
list-index
health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   idx_metadata_1 LaA64koiRESp4eF-Of1GTA   1   1          4            0     27.9kb         27.9kb
yellow open   idx_metadata_0 KGQ0aWacREKM44aEYLWkNA   1   1          4            3     41.1kb         41.1kb


$ python3 cli.py switch-alias --old-idx-metadata idx_metadata_0 --new-idx-metadata idx_metadata_1
switch-alias
You are about to switch the following alias:
'idx_metadata_0 -> 'idx_metadata_1'
Would you like to continue? [Y/n]
Y
The aliases were switched!


$ python3 cli.py list-alias
list-alias
alias    index          filter routing.index routing.search is_write_index
metadata idx_metadata_1 -      -             -              true


$ python3 cli.py delete-index idx_metadata_0
delete-index
You are about to delete the indexes in the list ['idx_metadata_0']. Would you like to continue? [Y/n]
Y
The indexes in the list ['idx_metadata_0'] were deleted!

Warning

If users add content between the index creation and switching of the alias, this content might not be indexed until it is changed again!