Code Documentation¶
The package filehooks
implements the main functionality for the backend of the CodeAbility Sharing Platform.
Since relative imports do not work without any issues, this module has to be installed in GitLab.
This package can be installed with pip
using the following command.
pip3 install .
Note
When installing this package manually the api token for GitLab, email username and email password have to be set in the conf.production.ini
before the installation!
filehooks module¶
This module can be installed within GitLab to handle events in its projects. Its purpose is checking for the presence of metadata describing the repository’s content and putting valid metadata into an elasticsearch index.
-
class
filehooks.
AnalysedCommit
(project: gitlab.v4.objects.Project, branch_name: str, commit_hash: str, errors: List[filehooks.MetadataInfo])[source]¶ Bases:
object
Contains information about the commit which was analysed, including the result of the analysis.
-
branch_name
: str¶
-
commit_hash
: str¶
-
errors
: List[filehooks.MetadataInfo]¶
-
project
: gitlab.v4.objects.Project¶
-
-
class
filehooks.
ConfigType
(value)[source]¶ Bases:
enum.Enum
Enum for choosing the desired configuration
-
DEBUG
= 2¶
-
LOCAL
= 4¶
-
PRODUCTION
= 0¶
-
STAGING
= 3¶
-
TEST
= 1¶
-
-
class
filehooks.
ErrorMessage
(project_info: filehooks.ProjectInfo, metadata_info: List[filehooks.MetadataInfo], footer_msg: str = '', help_msg: str = '')[source]¶ Bases:
object
Collection of all invalid metadata files for a project
-
help_msg
: str = ''¶
-
metadata_info
: List[filehooks.MetadataInfo]¶
-
project_info
: filehooks.ProjectInfo¶
-
-
class
filehooks.
EventHandler
(gitlab_instance: gitlab.Gitlab, mail: filehooks.Mail, elasticsearch_instance: elasticsearch.Elasticsearch, validation_service: filehooks.ValidationService, git_event: Dict[str, Any])[source]¶ Bases:
object
Class for handling GitLab events
-
elasticsearch_instance
: elasticsearch.Elasticsearch¶
-
git_event
: Dict[str, Any]¶
-
gitlab_instance
: gitlab.Gitlab¶
-
handle_event
() → None[source]¶ Calls the appropriate function to handle the GitLab system hook events push, project_rename, project_transfer, project_destroy, and group_rename. (https://docs.gitlab.com/ee/system_hooks/system_hooks.html)
- Returns:
None
-
handle_team_change_event
() → None[source]¶ Handles the user_add_to/remove_from_team event.
- Returns:
None
-
mail
: filehooks.Mail¶
-
validation_service
: filehooks.ValidationService¶
-
-
class
filehooks.
HealthCheck
(gitlab_instance: gitlab.Gitlab, validation_service: filehooks.ValidationService, mail: filehooks.Mail)[source]¶ Bases:
object
Handles validation of a project’s metadata and warns users when problems with the metadata are found.
-
gitlab
: gitlab.Gitlab¶
-
mail
: filehooks.Mail¶
-
send_validation_error_mail
(git_event: Dict[str, Any], analysed_commit: filehooks.AnalysedCommit) → None[source]¶ Sends an email to the user who triggered the validation warning about errors. Only call this method if there were errors, if the validation did not find any issues no email should be sent
- Parameters:
git_event – the event which triggered the validation
analysed_commit – information about the commit which was analysed
-
validate_project
(gitlab_instance: gitlab.Gitlab, project: gitlab.v4.objects.Project, commit: str) → Tuple[List[filehooks.MetadataInfo], Optional[filehooks.Node]][source]¶ Checks if the project’s metadata is defined correctly. This requires exactly one top-level metadata file, recognised by naming convention. While parsing the metadata, errors are collected in a list. Valid metadata file names are stored as a tree, resembling either an atomic project or a collection.
- Parameters:
project – the gitlab project to check
commit – the hash of the commit to analyze
- Returns:
List of validation errors, Tree of metadata (single node if repo is not a collection)
-
validation_service
: filehooks.ValidationService¶
-
-
class
filehooks.
Indexing
(gitlab_instance: gitlab.Gitlab, mail: filehooks.Mail, elasticsearch_instance: elasticsearch.Elasticsearch, index_name: str = 'metadata')[source]¶ Bases:
object
Contains the functionality required to create and update entries in the metadata index.
-
add_alias
(alias: str, indexes: List[str]) → None[source]¶ Adds an alias for a list of elasticsearch indexes.
- Parameters:
alias – The alias to be set
indexes – The list of indexes to set the alias
- Returns:
None
-
change_main_indexes
(indexes_metadata: Tuple[str, str]) → None[source]¶ Changes the main indexes for the metadata index by removing the alias from the old index and adding the alias the new index.
- Parameters:
indexes_metadata – A pair (old, new) of metadata indexes
- Returns:
-
create_new_index
() → None[source]¶ Creates a new index for metadata information. Afterwards the entire main group (sharing-group) is indexed.
- Returns:
None
-
delete_indexes
(indexes: List[str]) → None[source]¶ Deletes a list of elasticsearch indexes :param indexes: :return: None
-
elasticsearch
: elasticsearch.Elasticsearch¶
-
get_alias
(alias: str) → Any[source]¶ Tries to get the elasticsearch alias specified by the argument.
- Parameters:
alias – the name of an alias
- Returns:
the alias if it exists, None otherwise
-
get_all_index_names
() → List[str][source]¶ Returns the names of all indices in the elasticsearch instance.
-
get_all_projects
(group_id: int) → List[gitlab.v4.objects.Project][source]¶ Returns all projects of a group including projects from subgroups.
- Parameters:
group_id – ID of the group
- Returns:
List of all projects
-
get_projects_of_user
(user: gitlab.v4.objects.User) → list[source]¶ returns the list of projects for this user :return: list of user projects
-
gitlab_instance
: gitlab.Gitlab¶
-
group_rename
(group_id: int) → None[source]¶ Handles a group_rename event. Renames all projects of the given group (including subgroups).
- Parameters:
group_id – ID of the group to rename
- Returns:
None
-
index_all_metadata
(project_permissions: filehooks.ProjectPermissions, metadata_tree: filehooks.Node) → None[source]¶ Index the metadata of the project.
- Parameters:
project_permissions – contains information about who may access the project
metadata_tree – a tree of valid metadata file paths
- Returns:
None
-
index_all_projects
(get_relevant_projects: Callable[], List[gitlab.v4.objects.Project]], logger_prefix: str = '') → None[source]¶ Indexes the files of an entire group
- Parameters:
get_relevant_projects – method to generate relevant projects
:param logger_prefix just to add relevant info to logString :return: None
-
index_entire_repository
(metadata_tree: filehooks.Node) → None[source]¶ Indexes the files of an entire project
- Parameters:
metadata_tree – a tree of valid metadata file paths
- Returns:
None
-
index_metadata_node
(project_permissions: filehooks.ProjectPermissions, metadata_tree: filehooks.Node, parent_metadata: Dict[str, Any], parent_id: Optional[str]) → None[source]¶ Recursive function for indexing the metadata tree of a project.
- Parameters:
project_permissions – contains information about who may access the project
metadata_tree – tree of valid metadata file paths
parent_metadata – the parent’s metadata read from the file, used for inheritance
parent_id – the id of the parent metadata
- Returns:
None
-
index_name
: str¶
-
mail
: filehooks.Mail¶
-
print_all_aliases
() → None[source]¶ Prints information of all elasticsearch aliases on stdout
- Returns:
None
-
print_all_indexes
() → None[source]¶ Prints information of all elasticsearch indices on stdout
- Returns:
None
-
project_destroy
(project_id: int) → None[source]¶ Handles a project_destroy event. Deletes all elements in the index for the given project_id.
- Parameters:
project_id – ID of the project
- Returns:
None
-
project_rename
(project_id: int, path: str, path_with_namespace: str, url: str) → None[source]¶ Handles a project_rename event. Updates project_name, namespace, main_group and sub_group for all elements of the given project_id in the metadata index
- Parameters:
project_id – ID of the project
path – the path of the repository (project name)
path_with_namespace – repository path with namespace
url – GITLAB_URL to the repository
- Returns:
None
-
-
class
filehooks.
ItemPath
(path: str, metadata_file: str, commit: str, project_id: int, gitlab_instance: gitlab.Gitlab)[source]¶ Bases:
object
Represents a link to another Item (either relative to this path, to another project, or even to another repository
-
commit
: str¶ git commit id
-
create_children_itempath
(path: str) → filehooks.ItemPath[source]¶ creates a new itempath relative to self :param path: an id
-
gitlab_instance
: gitlab.Gitlab¶ a gitlab Instance
-
gitlab_project
: gitlab.v4.objects.Project¶ the cached gitlab project
-
metadata_file
: str¶ meta data file name
-
path
: str¶ directory path either relative to some other Item path, or absolute to root of object
-
project_id
: int¶ gitlab project id
-
-
class
filehooks.
Mail
(username: str, password: str, address: str, mail_from: str, host: str, port: int, homepage: str)[source]¶ Bases:
object
Class for sending e-mails about validation errors.
-
address
: str¶
-
classmethod
from_dict
(config: Dict[str, Any]) → Any[source]¶ Creates an instance of Mail from the given config.
- Parameters:
config – configuration to create the class from
- Returns:
the created Mail instance
-
homepage
: str¶
-
host
: str¶
-
mail_from
: str¶
-
password
: str¶
-
port
: int¶
-
send_mail
(mail_to: str, mail_subject: str, mail_html: str, mail_plain: str) → None[source]¶ Sends an e-mail via an SMTP server
- Parameters:
mail_to – Mail recipient
mail_subject – Mail subject
mail_html – Mail body as HTML
mail_plain – Mail body as plain text
- Returns:
None
-
username
: str¶
-
-
class
filehooks.
MetadataFileIndexer
(elasticsearch: elasticsearch.Elasticsearch, index: str, item: filehooks.ItemPath, project_permissions: filehooks.ProjectPermissions, parent_metadata: Dict[str, Any], parent_id: Optional[str], child_ids: List[str])[source]¶ Bases:
object
File indexer used to handle indexing of a metadata file (metadata.json, metadata.yml, or metadata.yaml) in elasticsearch :param elasticsearch the elasticsearch interfaces :param index the name of the index :param item path to the item :param project_permissions (read) permissions on this project :param parent_metadata the meta data of the parent object (in order to get it inherited :param parent_id the id of the parent :param child_ids a collection for child ids
-
child_ids
: List[str]¶
-
commit
: str¶
-
create_json_doc
() → Dict[str, Any][source]¶ Returns an indexable json document for elasticsearch.
- Returns:
The document which should be added/updated to/in elasticsearch
-
elasticsearch
: elasticsearch.Elasticsearch¶
-
file_path
: str¶
-
get_file
() → gitlab.v4.objects.ProjectFile[source]¶ Returns a single file of a commit.
- Returns:
the file
-
static
get_file_contents
(file: gitlab.v4.objects.ProjectFile) → bytes[source]¶ Returns the file contents.
- Parameters:
file – file to extract the file content
- Returns:
The file content
-
get_user_provided_metadata
() → Dict[str, Any][source]¶ Prepares the user provided part of the metadata which is to be stored in the metadata index. Also handles inheritance.
- Returns:
the user provided metadata which should be put in the index for the current file
-
index
: str¶
-
item
: filehooks.ItemPath¶
-
parent_id
: Optional[str]¶
-
parent_metadata
: Dict[str, Any]¶
-
project
: gitlab.v4.objects.Project¶
-
project_info
() → Dict[str, Any][source]¶ Returns information about the project.
- Returns:
The project information
-
project_info_metadata
() → Dict[str, Any][source]¶ Returns a dictionary containing the data which should be put as the value of the ‘project’ key in the metadata index
- Returns:
dictionary containing the project metadata
-
project_permissions
: filehooks.ProjectPermissions¶
-
-
class
filehooks.
MetadataInfo
(url: Optional[str], filename: Optional[str], errors: List[str] = <factory>, warnings: List[str] = <factory>)[source]¶ Bases:
object
Represents a metadata file with errors
-
errors
: List[str]¶
-
filename
: Optional[str]¶
-
url
: Optional[str]¶
-
warnings
: List[str]¶
-
-
exception
filehooks.
NoBranchToIndexError
[source]¶ Bases:
Exception
Error raised when a project has no branch with a name appearing in BRANCH_PRIORITY.
-
class
filehooks.
Node
(item: filehooks.ItemPath)[source]¶ Bases:
object
A simple tree structure for representing a metadata collection
-
children
: List[Any]¶
-
get_all_paths
() → List[filehooks.ItemPath][source]¶ Returns a list of all Nodes’ paths reachable from this node (the tree having this node as the root flattened into a list).
- Returns:
all Nodes’ paths reachable from this node
-
get_child_paths
() → List[filehooks.ItemPath][source]¶ Returns a list containing paths of all direct child nodes of this node.
- Returns:
the paths of all direct child nodes
-
item
: filehooks.ItemPath¶
-
print_tree
() → None[source]¶ Logs a visual representation of the tree having this node as the root. Siblings have the same indentation. A node’s children are the nodes logged directly below whose indentation level is one more than the parent node’s.
- Returns:
None
-
-
class
filehooks.
ProjectInfo
(event: dataclasses.InitVar[typing.Dict[str, typing.Any]], branch_name: dataclasses.InitVar[str], commit: dataclasses.InitVar[ProjectCommit])[source]¶ Bases:
object
Represents a project with erroneous metadata file(s)
-
branch
: str¶
-
branch_name
: dataclasses.InitVar[str]¶
-
branch_url
: str¶
-
commit
: dataclasses.InitVar[ProjectCommit]¶
-
commit_id
: str¶
-
commit_message
: str¶
-
commit_url
: str¶
-
event
: dataclasses.InitVar[typing.Dict[str, typing.Any]]¶
-
repository
: str¶
-
repository_url
: str¶
-
urls
: List[Tuple[str, str]]¶
-
user_avatar
: str¶
-
-
class
filehooks.
ProjectPermissions
(email_addresses_with_access: List[str], groups_with_access: List[str])[source]¶ Bases:
object
Data class for storing who has read access to a project.
-
email_addresses_with_access
: List[str]¶
-
groups_with_access
: List[str]¶
-
-
class
filehooks.
ValidationResult
[source]¶ Bases:
object
type that is returned by the validation service
-
errors
: List[str]¶
-
warnings
: List[str]¶
-
-
class
filehooks.
ValidationService
(rest_url: str = 'http://localhost:8080/api/validateMetaData')[source]¶ Bases:
object
Handles the validation of projectsÄ metadata and collects errors and warnings
-
custom_validation_result_decoder
(resultDict) → filehooks.ValidationResult[source]¶
-
rest_url
: str = 'http://localhost:8080/api/validateMetaData'¶
-
validate_metadata
(metadata: dict, top_level: bool) → filehooks.ValidationResult[source]¶
-
-
filehooks.
calculate_all_groups_members
(groups: List[gitlab.v4.objects.Group], gitlab_instance: gitlab.Gitlab) → List[str][source]¶
-
filehooks.
calculate_project_members
(project: gitlab.v4.objects.Project, gitlab_instance: gitlab.Gitlab) → Tuple[List[str], List[str]][source]¶ Returns a list of user emails allowed to read the project and a list of group names with read access to the project.
-
filehooks.
check_and_index_project
(project_id: int, gitlab_instance: gitlab.Gitlab, validation_service: filehooks.ValidationService, mail: filehooks.Mail, elasticsearch_instance: elasticsearch.Elasticsearch, metadata_mandatory: bool) → Optional[filehooks.AnalysedCommit][source]¶ Validates and indexes the project with the given id. Does not send error notifications to the user, but information about the analysed commit, including a (potentially empty) list of validation errors.
- Parameters:
project_id – the id of the project to index
gitlab_instance – the GitLab instance
mail – Mail object for sending error messages
elasticsearch_instance – the elasticsearch instance
- Returns:
information about the analysed commit or None if no suitable commit can be obtained
-
filehooks.
check_for_mandatory_fields_on_toplevel
(file_content: Dict[str, int]) → Iterator[jsonschema.exceptions.ValidationError][source]¶ checks the file_content for entries that are mandatory on the top level
-
filehooks.
check_for_single_metadata_files
(metadata_files: List[str]) → List[filehooks.MetadataInfo][source]¶ Checks if there is exactly one metadata file.
- Parameters:
metadata_files – List of metadata files
- Returns:
List of errors
-
filehooks.
check_if_file_exists
(path: filehooks.ItemPath) → Tuple[bool, str][source]¶ Checks if a file specified by path exists.
- Parameters:
path – a normalized path
- Returns:
True if the file exists, False otherwise
-
filehooks.
deduplicate_paths
(paths: List[filehooks.ItemPath]) → Tuple[List[filehooks.ItemPath], List[filehooks.ItemPath]][source]¶ Takes a list of paths and checks for duplicates. If duplicates are found, they are returned in the first element of the return tuple. The second element of the return tuple contains the de-duplicated list of paths. The order of the paths will be preserved. For duplicates, the first element will be kept.
- Parameters:
paths – a list of normalized paths, potentially containing duplicates.
- Returns:
tuple consisting of a list of duplicate paths and the de-duplicated list.
-
filehooks.
generate_duplicate_path_warning
(path: filehooks.ItemPath) → str[source]¶ Generates a message warning about a duplicate path in a metadata file’s collectionContent.
- Parameters:
path – the path appearing more than once
- Returns:
a message warning about a duplicate path in a metadata file’s collectionContent
-
filehooks.
get_branch_to_index
(project: gitlab.v4.objects.Project) → str[source]¶ Returns the name of the branch which should be indexed, following the priority given by BRANCH_PRIORITY. Raises NoBranchToIndexError if no branch with a name appearing in BRANCH_PRIORITY exists.
- Parameters:
project – The project whose branches should be checked
- Returns:
The name of the branch to index
- Raises:
NoBranchToIndexError – when no branch with a name in BRANCH_PRIORITY exists
-
filehooks.
get_commit_to_index
(project_id: int, gitlab_instance: gitlab.Gitlab) → Tuple[gitlab.v4.objects.Project, str, str][source]¶ Tries to obtain the project, branch name and commit of the project which should be indexed. Raises a NoBranchToIndexError if no suitable branch is found.
- Parameters:
project_id – the id of the GitLab project
gitlab_instance – the GitLab instance
- Returns:
project, branch name, commit hash
- Raises:
NoBranchToIndexError – if no suitable branch exists
-
filehooks.
get_indexing_commit
(project: gitlab.v4.objects.Project) → Tuple[str, str][source]¶ Tries to obtain the branch name and commit of the project which should be indexed.
- Parameters:
project – the project which should be indexed
- Returns:
branch name, commit hash
- Raises:
NoBranchToIndexError – if no suitable branch exists
-
filehooks.
get_project_for_id
(project_id: int, gitlab_instance: gitlab.Gitlab) → gitlab.v4.objects.Project[source]¶ Tries to obtain the project from git.
- Parameters:
project_id – the id of the GitLab project
gitlab_instance – the GitLab instance
-
filehooks.
get_relevant_commit_hash
(project: gitlab.v4.objects.Project, branch_name: str) → str[source]¶ Gets the hash of the latest commit on the branch to index for the given project.
- Parameters:
project – the project to get the hash for
branch_name – name of the branch of which the latest commit should be shown
- Returns:
the hash of the latest commit on the branch to index.
-
filehooks.
get_repository_metadata_files
(project: gitlab.v4.objects.Project, commit: str) → List[str][source]¶ Returns all metadata files in the repository’s root.
- Parameters:
project – the project instance
commit – the hash of the commit to analyze
- Returns:
List of metadata files
-
filehooks.
in_main_group
(path_with_namespace: str) → bool[source]¶ Checks if the root of the given namespace is in the main group
- Parameters:
path_with_namespace – path to the repository to be checked
- Returns:
True, if in main_group
-
filehooks.
is_regular_file
(mode: str) → bool[source]¶ Returns true if the file mode corresponds to a regular file and false otherwise.
- Parameters:
mode – file mode
- Returns:
True, if regular file.
-
filehooks.
load_config
(config_type: filehooks.ConfigType = <ConfigType.PRODUCTION: 0>) → Tuple[gitlab.Gitlab, filehooks.Mail, elasticsearch.Elasticsearch][source]¶ Parses the configuration in filehooks/config.ini. This file only exists when deployed in the GitLab container and is a copy of one of the files in filehooks/conf/. Which file is used depends on the configuration when setting up GitLab.
-
filehooks.
logger_setup
(filepath: str) → Dict[str, Any][source]¶ Returns a dictionary which can be used to configure a logger.
- Parameters:
filepath – path of the log file
- Returns:
a dictionary to configure a logger
-
filehooks.
normalize_collection_content_paths
(collection_content: List[str], parent: filehooks.ItemPath) → Tuple[List[str], List[filehooks.ItemPath]][source]¶ Takes the “collectionContent” list from a metadata file, normalizes the paths, and warns about issues, such as duplicate paths.
- Parameters:
collection_content – the “collectionContent” list from a metadata file
parent – normalized path of the metadata file from which “collectionContent” is taken
- Returns:
tuple consisting of a list of error messages and a list of normalized file paths without duplicates
-
filehooks.
normalize_path
(child_path: str, parent: filehooks.ItemPath, parent_path: Optional[str] = None) → str[source]¶ Takes a potentially un-normalized child path and normalizes it, checking for potential errors. if the path can be normalized and is valid, the normalized path is returned as a str. Otherwise, a PathError is raised.
- Parameters:
child_path – potentially un-normalized path from the parent’s “collectionContent”
parent – the parent of this child
parent_path – normalized path of a metadata file
- Returns:
the normalized path
- Raises:
PathError – when the path cannot be normalized or is invalid
-
filehooks.
parse_child_path
(child_path: str) → Tuple[str, int, str][source]¶ parses a child_path and returns the git url, the project id, and the relative path
-
filehooks.
parse_metadata_file
(file_contents: bytes, extension: str) → Dict[str, Any][source]¶ Given the base64 encoded content of a file and the extension of the file, this function attempts to parse the content according to the format specified by the extension. If the extension is ‘.json’ it tries to parse it as a json file, otherwise the data is treated as YAML.
- Parameters:
file_contents – base64 encoded content of the file to parse
extension – the file extension, including a leading dot. One of {‘.json’, ‘.yaml’, ‘.yml’}
- Returns:
If successful, a dictionary with the parsed key-value pairs
- Raises:
ValueError – if the file contents cannot be parsed as a dictionary
-
filehooks.
path_2_key
(validation_error: jsonschema.exceptions.ValidationError) → str[source]¶ Returns the path to the key for which a validation error occurred.
- Parameters:
validation_error – Validation error
- Returns:
Path to validation error
-
filehooks.
read_gitlab_event
() → Optional[Dict[str, Any]][source]¶ Reads the GitLab system hook event from stdin.
- Returns:
The event
-
filehooks.
unify_keywords
(file_contents: Dict[str, Any]) → Dict[str, Any][source]¶ Given a dictionary representing the user provided metadata, this function normalizes some keywords. Currently the only thing it does is converting the programming language names to titlecase.
- Parameters:
file_contents – dictionary representing the user provided metadata
- Returns:
the input dictionary with normalized keywords
-
filehooks.
validate_collection
(gitlab_instance: gitlab.Gitlab, parent: filehooks.ItemPath, collection_content: List[str], schema: Any, nesting: int, visited_parents: List[filehooks.ItemPath] = []) → Tuple[List[str], List[filehooks.MetadataInfo], List[filehooks.Node]][source]¶ Parses a metadata file’s collectionContent. Returns a tuple consisting of a list of errors in the parent’s collectionContent specification, a list of errors obtained recursively from the children, and a list of child nodes.
- Parameters:
parent – ItemPath to current meta data
collection_content – the metadata file’s collectionContent (list of paths)
schema – a dictionary which can be used by jsonschema Draft 7
nesting – nesting level of children (0 is top most)
- Returns:
a tuple consisting of a list of errors in the parent’s collectionContent specification,
a list of errors obtained recursively from the children, and a list of child nodes
-
filehooks.
validate_metadata_file
(gitlab_instance: gitlab.Gitlab, validation_service: filehooks.ValidationService, metadata_file_path: filehooks.ItemPath, schema: Any, nesting: int, visited_parents: List[filehooks.ItemPath] = []) → Tuple[List[filehooks.MetadataInfo], Optional[filehooks.Node]][source]¶ Validates a single metadata file. A file is valid iff it can be parsed in the format specified by its extension (JSON or YAML) and the content conforms to the given schema.
- Parameters:
metadata_file_path – the file to validate
schema – a dictionary which can be used by jsonschema Draft 7
nesting – nesting level of children (0 is top most)
- Returns:
Information about the metadata file, including a potentially empty list of errors.
scripts¶
Collection of all scripts which can be installed in GitLab to extend its functionality.
Note
Those scripts assume that the package filehooks
is installed!
Tests¶
Tests are written using the Python testing framework pytest
.
Unit tests¶
Unit tests can be found in tests/filehooks
and can be executed manually by the following command:
pytest --cov-report term-missing --cov=filehooks/ tests/filehooks
After each alteration of the code, it should be ensured that a code coverage of 100% is available. On every push event, unit tests are automatically executed by GitLab CI/CD.
Integration tests¶
Since the code dependents heavily on the GitLab behavior, which was mocked in the unit tests, integration tests are provided to check that the entire system works as expected.
Note
The integration tests’ current implementation is not very robust, meaning that some tests may fail when executed too fast even though they would pass if waited long enough.
The integration tests are not executed automatically by GitLab CI/CD.
Those tests can be executed locally by running ./run_integration_tests.sh
.
In order to run the integration tests in isolation, they run in a dedicated
set of containers.
These are created automatically upon calling the script mentioned above,
if they do not exist already.
They have the same names as the containers which are created for production,
but with the postfix _integration
.
Data used by these containers and the container running the integration tests
is stored in /tmp/sharing/integration/
by default.
This location can be configured in the run_integration_tests.sh
script.
Note
Some tests use the configuration file filehooks/conf/conf.test.ini
. Please ensure that the correct values are set before test execution.
Note
The containers use quite a lot of memory, so running the integration tests while the normal set of containers is running could be problematic on systems with too little memory/swap space.
Linter¶
The tools pylint
and flake-8
are used for static code analysis.
Experience shows that pylint
is more strict and verbose.
However, if flake-8
finds a potential issue, it is worth checking it out.
Some default settings of the tools had to be adjusted. It should be ensured that the pylint
-score always reaches 10 points.
If a potential issue is fine the way it is, suppress it.
Automated code checks¶
This project uses git hooks to automatically check the code. Git hooks are scripts which are automatically executed upon certain git events. They can be installed on the client side (the developers machine) and on the server side (the git server which hosts the repository, e.g. a GitLab instance).
Installation¶
For development, a client side hooks is used to check code about to be committed. This is done by a pre-commit hook. Client side hooks need to be installed by the developer an their machine:
Install the python package
pre-commit
, e.g. by runningpip install pre-commit
. For other installation methods, see https://pre-commit.com/#installation.Go to the root directory of the project and run
pre-commit install
.
Working with code checks in place¶
When these two steps succeeded, git will automatically run some checks on the code before every commit. If any check fails, the commit is aborted.
Some checks which auto-format files fail if they do any formatting.
When this happens, the changes they do are not put into the git index
and need to be added manually, e.g. by running git add <auto-formatted file>
in order to include them in the commit.
Some IDEs which allow committing from within them might not do a good job at displaying the error messages.
Running git commit
from a terminal should give much better feedback,
including colored messages and information about which check is running at the moment.
Checks in use¶
The pre commit hooks runs several checks.
The detailed configuration is found in the pre-commit configuration file .pre-commit-config.yaml
.
These checks are run:
some generic hooks (pre-commit defaults)
isort
sorts import statements (changes are not added to git index automatically)black
formats code (changes are not added to git index automatically)mypy
checks type annotationspyright
checks type annotationsflake8
linterpylint
linterpytest
runs unit tests
Please keep in mind that the integration tests are not run automatically. Since it takes several minutes to run them, it would not be reasonable to include them here.
Disabling checks¶
Sometimes it might be necessary to temporarily disable on of the checks.
One way to do it is forcing the commit, which just skips all checks.
However, this is discouraged, since most checks will probably be useful.
For example, when pylint finds an issue in the code which you intend to fix in a later commit
it still makes sense to auto-format the code. In such cases the SKIP
environment variable can be used:
SKIP=pylint,flake8 git commit
sets this variable and initiates a commit.
The variable takes a comma-separated list of checks to skip.
The names of the tools can be found in the configuration file for pre-commit, .pre-commit-config.yaml
.
There looks for the values of id
keys.
GitLab CI¶
Since the git-hook based checks need to be installed by each developer on their machine, it could happen that someone forgets to use them. GitLab continuous integration jobs are used to run most of the same checks in our GitLab instance. This requires a GitLab runner to be installed.
The configuration is found in .gitlab-ci.yml
.
Each push to GitLab starts a pipeline which runs the configured tools. Linter failures are indicated as warnings, unit test failures as errors which abort the pipeline. It is possible to view the command line output of the tool in GitLab, and in some cases artifacts can be downloaded.
Commandline Interface¶
A command-line interface for elastic search index manipulation is provided. It supports the following operations:
List indexes (
list-index
)Create index (
create-index
)Delete index (
delete-index
)Delete unused MetaData indices (
delete-unused-indices
)List aliases (
list-alias
)Switch aliases (
switch-alias
)Reindex (
reindex
)
Workflow¶
Start a shell in the
sharing_gitlab
docker containerdocker exec -it sharing_gitlab /bin/bash
Navigate to
utils
(e.g.,/file-hooks-src/utils
)Run
python3 cli.py
with the appropriate argumentsNote
Running
python3 cli.py
with no arguments yields the help output.Commands¶
list-index
¶Functionality: List all elasticsearch indexes
Usage:
python3 cli.py list-index
create-index
¶Functionality: Creates new indexes for metadata information
Usage:
python3 cli.py create-index \ -idx-metadata IDX_METADATA
Arguments:
IDX_METADATA: Name of new index for metadata information
delete-index
¶Functionality: Deletes elasticsearch index
Usage:
python3 cli.py delete-index IDX
Arguments:
IDX: Name of index to be deleted
delete-unused-indices
¶Functionality: Deletes currently unused metadata indices in elasticsearch
Usage:
python3 cli.py delete-unused-indices
list-alias
¶Functionality: List all elasticsearch aliases
Usage:
python3 cli.py list-alias
switch-alias
¶Functionality: Removes the alias for the old metadata index and adds a new alias for the new metadata index
Usage:
python3 cli.py switch-alias \ --old-idx-metadata OLD_IDX_METADATA \ --new-idx-metadata NEW_IDX_METADATA
Arguments:
OLD_IDX_METADATA: Name of new index for metadata information
NEW_IDX_METADATA: Name of new index for metadata information
Note
Running
python3 cli.py -h
yields the help output.Running
python3 cli.py <command> -h
yields the help for the specified command.
reindex
¶Functionality: Creates a new index for metadata, fills them and switches the alias.
Usage:
python3 cli.py reindex
Example¶
Assuming you executed steps 1 and 2 from the workflow. Your workflow for reindexing might look like:
$ python3 cli.py list-index list-index health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open idx_metadata_0 KGQ0aWacREKM44aEYLWkNA 1 1 4 3 41.1kb 41.1kb $ python3 cli.py list-alias list-alias alias index filter routing.index routing.search is_write_index metadata idx_metadata_0 - - - true $ python3 cli.py create-index idx_metadata_1 $ python3 cli.py list-index list-index health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open idx_metadata_1 LaA64koiRESp4eF-Of1GTA 1 1 4 0 27.9kb 27.9kb yellow open idx_metadata_0 KGQ0aWacREKM44aEYLWkNA 1 1 4 3 41.1kb 41.1kb $ python3 cli.py switch-alias --old-idx-metadata idx_metadata_0 --new-idx-metadata idx_metadata_1 switch-alias You are about to switch the following alias: 'idx_metadata_0 -> 'idx_metadata_1' Would you like to continue? [Y/n] Y The aliases were switched! $ python3 cli.py list-alias list-alias alias index filter routing.index routing.search is_write_index metadata idx_metadata_1 - - - true $ python3 cli.py delete-index idx_metadata_0 delete-index You are about to delete the indexes in the list ['idx_metadata_0']. Would you like to continue? [Y/n] Y The indexes in the list ['idx_metadata_0'] were deleted!
Warning
If users add content between the index creation and switching of the alias, this content might not be indexed until it is changed again!