.. _ref_filehooks:
FileHooks and Indexing
======================
The fileHooks project (https://sharing-codeability.uibk.ac.at/development/sharing/file-hooks) is a simple infrastructure for forwarding
events from GitLab to the GitSearch REST service at http://sharing_search:8080/api/gitlab/eventListener.
The services GitLab and Elasticsearch are considered backend services.
It is responsible for the data collection and preparation.
This section describes the fileHooks used in GitLab and the infrastructure setup.
Finally, some tips to handle errors are provided.
Infrastructure Setup
--------------------
It is currently assumed that all services run on the same host as separate docker containers.
The setup of the containers and the host server is discussed in the following.
Lastly, the manual installation procedure for file hooks is given as a reference.
Container Setup
~~~~~~~~~~~~~~~
Subsequently, the setup for GitLab, PlantUml and Elasticsearch is shown.
The setup of the Services GitLab search and MySQL are discussed in the section :ref:`ref_git_search`.
To create all containers for the backend in production, a docker-compose script is provided in ``src/main/docker/gitsearch.yml``.
The docker container for gitlab was extended by sendmail. The extended docker definition can be found in ``src/main/docker/gitlab-setup/sendmail/Dockerfile``, which is referenced in ``gitsearch.yml``.
It can be executed as follows:
- ``cd src/main/docker/``
- ``. .env`` the .env file can be found in the KeePass file. It contains the secrets for the containers. A current version of .env can be found in the keepass file.
- ``export GITLAB_HOME``
- ``export INDEXING_SERVICE_URL``
- ``export GITBRANCH=development # or some other gitlab branch``
- ``export COMMIT_ID=$(git rev-parse HEAD); export COMMIT_DATE=$(git show -s --format=%ct``
- ``docker-compose -f gitsearch.yml create gitlab`` # this may fail, if the format of the gitsearch-app version needs to be adapted.
.. note::
the command may complain about missing variables COMMIT_ID, COMMIT_DATE. This is ok, since the variables are only used in the gitlab container.
- ``docker-compose -f gitsearch.yml up -d gitlab plantuml elasticsearch``
The following environment variables are set within the config files.
No modification should be required for those if the correct config file is used.
- ``GITLAB_HOME``: Directory where data generated by GitLab is persisted
- ``EXTERNAL_URL``: External Url of the GitLab instance
- ``GITLAB_HOSTNAME``: Hostname of GitLab
- ``ES_HOME``: Directory where data generated by Elasticsearch is persisted
- ``GITLAB_HOSTNAME``: Hostname of GitLab
- ``INDEXING_SERVICE_URL``: The url of the gitlab event indexing-service (locally to the docker network)
+----------------------+-----------------------------------------------------+-----------------------------------------------------+
| Environment variable | Production | Development |
+======================+=====================================================+=====================================================+
| GITLAB_HOME | /mnt/qt-sharing-codeability/srv/gitlab | /mnt/qt-codeability-austria/sharing/srv/gitlab |
+----------------------+-----------------------------------------------------+-----------------------------------------------------+
| ES_HOME | /mnt/qt-sharing-codeability/es | /mnt/qt-codeability-austria/sharing/es |
+----------------------+-----------------------------------------------------+-----------------------------------------------------+
| EXTERNAL_URL | https://sharing-codeability.uibk.ac.at | https://sharing.codeability-austria.uibk.ac.at |
+----------------------+-----------------------------------------------------+-----------------------------------------------------+
| GITLAB_HOSTNAME | sharing-codeability.uibk.ac.at | sharing.codeability-austria.uibk.ac.at |
+----------------------+-----------------------------------------------------+-----------------------------------------------------+
| INDEXING_SERVICE_URL | http://sharing_search:8080/api/gitlab/eventListener | http://sharing_search:8080/api/gitlab/eventListener |
+----------------------+-----------------------------------------------------+-----------------------------------------------------+
.. note::
If the container is set up from scratch (there are no persisted data available), a password for the user ``root`` has to be specified using the web interface.
For the development and production server, this password should be added to KeePass.
Alternatively, the password can also be set directly in the GitLab container:
``docker exec -it sharing_gitlab gitlab-rake 'gitlab:password:reset[root]'``
Installing the Filehooks package
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the previous section, the container infrastructure is set up.
When this is successfully done, the filehooks code needs to be installed in the GitLab container.
There is a script in the ``src/main/filehooks/setup`` directory which does this automatically:
.. code-block::
./install_filehooks_locally.sh
This script copies files from the repository into the GitLab container
and sets up the code such that it is run whenever GitLab emits an event.
The script also installs the required python packages.
For debugging and inspection the GitLab container can be accessed interactively by running
.. code-block::
docker exec -it sharing_gitlab /bin/bash
Server Setup
~~~~~~~~~~~~
To make GitLab reachable from outside, conduct the following steps after connecting to the server via ``ssh``:
1. Add the following snippet to ``/etc/apache2/sites-enabled/default-ssl.conf``:
.. code-block::
# Sharing
SSLProxyEngine On
AllowEncodedSlashes NoDecode
SSLProxyVerify none
SSLProxyCheckPeerCN off
SSLProxyCheckPeerName off
SSLProxyCheckPeerExpire off
##### gitlab #####
ProxyPass / https://sharing-codeability.uibk.ac.at:10083/ nocanon
ProxyPassReverse / https://sharing-codeability.uibk.ac.at:10083/
##### Portainer #####
RewriteRule ^/portainer$ /portainer/ [R,L]
ProxyPass /portainer/ http://sharing-codeability.uibk.ac.at:9000/
ProxyPassReverse /portainer/ http://sharing-codeability.uibk.ac.at:9000/
ProxyPass /portainer/api/websocket/ http://sharing-codeability.uibk.ac.at:9000/api/websocket/
ProxyPassReverse /portainer/api/websocket/ http://sharing-codeability.uibk.ac.at:9000/api/websocket/
ErrorLog ${APACHE_LOG_DIR}/error.sharing-codeability.uibk.ac.at.log
CustomLog ${APACHE_LOG_DIR}/access.sharing-codeability.uibk.ac.at.log combined
##### static pages frontup #####
# Michael further tools settings
Include /etc/apache2/codeAbility/sharing/*.conf
.. note::
Please review the configuration above carefully. Gitlab is very sensitive, when run behind a reverse proxy!
2. Add the following snippet to ``/etc/apache2/sites-enabled/000-default.conf`` for a redirect from http to https:
.. code-block::
# ...
Redirect / https://sharing-codeability.uibk.ac.at
3. ``sudo systemctl restart apache2``
Manual File Hook Setup
~~~~~~~~~~~~~~~~~~~~~~
.. note::
Just for reference. This should not be necessary, since the filehooks are installed automatically by the script ``install_filehooks_locally.sh``.
As a reference on how to add other file hooks to GitLab, the steps to install the file hook ``trigger_project_update.py`` are given below:
1. Install python requirements:
.. code-block:: bash
pip3 install --upgrade setuptools
pip3 install -r requirements.txt
2. Create API-Token with the user ``root`` and the scopes ``api``, ``read_api``, ``read_repository``
3. Add the API-Token in ``conf.production.ini`` (section ``gitlab``, key ``token``)
4. Install the ``filehooks`` package
.. code-block:: bash
pip3 install .
5. Install java
.. code-block:: bash
apt-get install openjdk-8-jdk
6. Optional: (Re-) Initialize the indices the gitsearch application (Menuitem Administation -> Elasticsearch Management) .
A click on the button "+ reindex" creates a new metadata index and indexes the complete repository anew. During the indexing process, the previous index is still available for search queries.
The indexing process can take a long time, depending on the size of the repository. The progress of the indexing process can be viewed in the logging panel on the page.
Incoming gitlab events are withhold until the indexing process is finished.
At the end of the reindexing process, the new index is activated and the old index is deactivated. The new index is now available for search queries.
7. You can delete the old indices by clicking on the button "delete".
Infrastructure Update
---------------------
Subsequently, a guide for updating GitLab and the filehooks is provided.
Update Guide GitLab
~~~~~~~~~~~~~~~~~~~
1. Navigate to the directory ``src/main/filehooks``.
2. Create a backup of GitLab with the script ``backup_sharing_gitlab.sh``
3. Navigate to the parent directory of ``$GITLAB_HOME`` and copy the mounted volume, e.g.,
.. code-block::
cp -a srv srv_2021_01_31
4. Change the GitLab version in the file ``src/main/docker/gitlab-setup/sendmail/Dockerfile``.
5. Then proceed with the steps described in the section :ref:`ref_container_setup`.
6. Check if the filehooks work properly.
.. note::
When upgrading the GitLab version, follow the `upgrade recommendations `_ from GitLab.
Update Guide Filehooks
~~~~~~~~~~~~~~~~~~~~~~
1. Check out the version of the code which should be deployed somewhere in the file system.
2. Run ``cd src/main/filehooks; ./install_filehooks_locally.sh``. This script mainly copies the current filehook to the docker volume and installs required python packages.
Errors
------
In case a container crashes, it should start automatically.
Consequently, it should not be necessary to start any container manually after the setup was executed successfully.
.. warning::
If the GitLab container crashes, the python-package ``filehooks`` is not re-installed automatically. (TODO: check whether this is still true)
Hence, new or changed files will not be added to elastic search.
You have to install the filehooks (see update guide filehooks)! And do a complete reindexing, to ensure a consistent index.
Subsequently, the logging systems for GitLab and FileHooks are discussed.
GitLab
~~~~~~
GitLab has an advanced logging system distributed over many log files. Details can be found in the `GitLab documentation `_.
For example, the command ``docker logs -f -n 10 sharing_gitlab`` can be used to inspect the logs.
FileHooks
~~~~~~~~~
- ``/var/log/gitlab/gitlab-rails/file_hook.log``: Fatal errors (e.g., unexpected exceptions) are logged in this file.
- ``/var/log/gitlab/gitlab-rails/trigger_project_update.log``: General logging information for the fileHook ``trigger_project_update.py`` are logged in this file.
GitSearch Indexer
-----------------
.. _ref_gitsearch_indexer:
The GitSearch Indexer listens to requests via the REST service at http://sharing_search:8080/api/gitlab/eventListener.
It is responsible for validating and updating the Elasticsearch index.
This GitSearch Indexer does two tasks:
1. Health check and validation: It informs the user who modified the project via email if the metadata information is incomplete or invalid after a modification in a repository was conducted.
Validation happens on the ``master``/``main``-branch of all projects in the group ``sharing``. It also checks projects in all other groups, however if they do not contain meta data, the check is skipped.
The indexer will mainly be triggered by push events, but also by moving or renaming a project and or groups/namespaces.
The check proceeds as follows:
First, the root directory of the repository is checked for files named ``metadata.json``, ``metadata.yaml``, or ``metadata.yml``.
There must be exactly one such file, otherwise the check fails.
Subsequently, the correctness of all metadata files is validated (also dependent metadata files, if it is a collection).
If an error occurred, an email is sent to the user who pushed the changes.
Meta data checks comprise:
- the syntactical correctness of the metadata file as yaml or json file (results in an error)
- the presence of the required fields (results in an error)
- the presence of the required fields in the dependent metadata files (results in an error)
- checks against the vocabulary service at https://oeresource.logic.at/en/meta/api/v1?format=json (results in a warning)
The check fails if there is an error, but is accepted if there are only warnings. In both cases the author is informed by e-mail.
2. It keeps the Elasticsearch index up-to-date by adding/updating/deleting files according to the triggered GitLab event.
Only the ``main``-branch (or ``master`` if ``main`` does not exist) and the group ``sharing`` (including subgroups and all subprojects) are indexed in Elasticsearch.
Metadata files (``metadata.json``, ``metadata.yaml``, or ``metadata.yml``) at the project root are indexed in the alias ``metadata``.
Finally, the GitSearch Indexer provides functionality, to recreate the index and to recheck all projects. During this task all event-processing is postponed.