Data Library New Installation
This section provides instructions for installing a new instance of the Data Library software on a server configured with CentOS Stream 9 or 10.
See also
To install CentOS Stream on the server, please reference Install CentOS Stream.
Installation of the Data Library software is automated using
ansible, a configuration
management tool which uses python3.12. A virtual python environment for this
should already be installed in /opt/datalib_venv3.12
, if the server
installation process was followed.
See Install python3.12, git and ansible.
Configure git
We currently use bitbucket.org as our git cloud repository.
The person installing and maintaining the Data Library should already have an account or should create an account on https://bitbucket.org. Each person maintaining the Data Library should have their own account and follow these instructions individually.
Once you have an account, inform the Data Library staff to grant you access to the repositories.
Create the administrator’s personal key pair
Create a new SSH Key on the Data Library Server
ssh-keygen -t ed25519 -b 4096 -f ~/.ssh/id_bitbucket
This will create an ssh key in
~/.ssh/id_bitbucket
that you will use to communicate with Bitbucket. Set a passphrase for your key. It can be the same as your Bitbucket or login password.Add the key to your
~/.ssh/config
file by adding these lines to the file:Host bitbucket.org AddKeysToAgent yes IdentityFile ~/.ssh/id_bitbucket
Add the administrator’s public key to Bitbucket
Add your key to bitbucket.org
Login to your bitbucket.org account
Select the
Settings
gear icon in the upper right of the browser window and selectPersonal Bitbucket Settings
.Under
Security
, selectSSH Keys
Select
Add Key
In the
Add SSH key
dialog, provide a Label to help you identify which key you are adding. For example, you could use the account name of the user on the server you’re setting the key up for.Copy the contents of
~/.ssh/id_bitbucket.pub
to theKey
field of theAdd SSH key
dialog.Select
Add Key
if it fails, check that you copied the contents properly. It should look something like:ssh-ed25529 LLoWYaPswHzVqQ7L7B07LzIJbntgmHqrE40t17nGXL71QX9IoFGKYoF5pJKUMvR+DZotTm user@example.com
Confirm that the key works: on the Data Library server,
ssh -T git@bitbucket.org
It should return:
authenticated via ssh key You can use git to connect to Bitbucket. Shell access is disabled
Create access key for deployment
The key we configured in the previous steps will be used by you, the system administrator, to make changes to the system configuration repository. We will now create a second key pair that will be used by non-administrators to deploy content updates from the data catalog and maproom repositories.
To create an access key:
ssh-keygen -t ed25519 -b 4096 -f ~/.ssh/datag_access_key
Leave the passphrase blank.
Add deployment access key to Bitbucket
In the following, the _xxx is your country-specific suffix. For example: kmd
, nimet
, madagascar
, etc.
Install this key into each repository you need ansible to access. For example,
dlentries_xxx
maproom_xxx
python_maproom_xxx
(this one is optional and depends on if you have a python maproom)
You should know where these repositories are, but you can also find them in your playbook.yaml file.
For each repository, add the access key by navigating to Repository settings
→ Access keys
, clicking the Add key
button, and filling out the form as you did for the administrator’s key, but pasting the contents of ~/.ssh/datag_access_key.pub
into the Key
field.
Configure the Data Library
Create a configuration repository for your account. This will be used to hold and store all your data library customizations. These will be put into git and shared with any other team members responsible for the DL installations and updates.
mkdir dlconfig cd dlconfig git init
Inside this new git repository, install the IRIDL ansible collection and dependencies.
source /opt/datalib_venv3.12/bin/activate ansible-galaxy collection install -p . \ git+https://github.com/iridl/iridl-ansible.git
The previous command should have downloaded the collection to a subdirectory called
ansible_collections
. Add that directory to your git repository, and commit the changes.
git add ansible_collections
git commit -m "add iridl ansible collection"
Copy template configuration files from the collection to the top level of the repository:
cp ansible_collections/iridl/iridl/example/* .
Modify
playbook.yaml
andsecrets.yaml
to customize them to the specifics of your site. The files you copied contain example configuration values that should be replaced with real email addresses, usernames, etc. The files include comments that explain the purpose of each configuration option. If you are not ready to set up your real Data Library server but merely want to practice the installation process, e.g. in a virtual machine, you can use the example files without modification.See also
The example ssh key in
secrets.yaml
should be replaced with the deployment access key you created in Create access key for deployment.Move
secrets.yaml
out of the git repository. For security reasons, unencrypted secrets should not be committed to version control.mv secrets.yaml ..
Commit your customizations and push them to your git server for safe keeping; back up
secrets.yaml
by other means, such as copying it to another machine.git add inventory.cfg playbook.yaml git commit -m "add inventory and playbook"
Note
Never edit the contents of the ansible_collections
directory. All
customization should be made in the configuration files that you copied
from the template. In the future when it comes time to upgrade to a newer
version of the DL software, you will run the ansible-galaxy
command
again and commit the new version to your configuration repository.
Run the ansible playbook
Now you are ready to run the playbook, which will download, configure, and
install the Data Library software using the parameters you defined in the
configuration files. We have created a convenience script, run-ansible
,
for this purpose.
From the root directory of the configuration repository, run the following command
./run-ansible --build
It will prompt you for a password, which will be the password of the user you are logged in as, assuming you have sudo privileges.
BECOME password:
Each step of the installation will be printed to the terminal. At a site with a fast connection to the internet, the playbook generally finishes within ten minutes, but if bandwidth is limited it may take a few hours, as the installation process involves downloading several GB of software packages and container images.
You should now be able to visit your Data Library server in a browser, but the maprooms are not yet functional because the data that underlies them has yet to be installed.
Any flag you can pass to ansible-playbook
can also be passed to the
run-ansible
script. For example, if there are errors or failures,
you can pass -vvv
to the run-ansible command to get more information
about why it is failing.
./run-ansible -vvv
If you want to run it in check
mode, pass the flag --check
to run in
test mode without changing the Data Library.
./run-ansible --check
Install datasets
Among other things, the ansible playbook creates structures (directories, groups, a database, and permissions) to support the installation of datasets. You can now install your data as described in Installing data. A member of the IRI staff will typically be involved in this process, as it may involve copying large amounts of data from an IRI server to yours.
Use your new Data Library
You should now be able to visit your Data Library server in a browser. For next steps, see the Maintenance page of the current guide, and the User Guide.