Installation

This section provides instructions for installing the Data Library software on a server.

Installation of the Data Library software is automated using ansible, a configuration management tool.

Prepare the server

Before running the installation script, perform the following steps to prepare the server:

  • Install CentOS 9 Stream from https://www.centos.org/centos-stream/#download. Select the appropriate architecture to download the ISO version.

    • For more information on this choice of operating system, see Infrastructure.

    • To create a bootable CD of the Installation ISO, follow these instructions.

    • Installation Notes

      • Install as Server with GUI to make it easier to maintain.

      • When configuring your disk partitions

        • Use LVM when creating the filesystems. This is the default.

          • Don’t modify /boot or /boot/efi

        • Increase root partition to 100GB

        • Create a /data partition large enough to hold all of your data now and in the future.

        • Leave 10% of the disk unallocated. This will allow you to create snapshots or increase the size of partitions later if necessary.

  • Create a user account for the system administrator who will be performing the installation. Make that user a member of the wheel group so that they will be able to perform commands as root using sudo.

  • Install git and python3:

      sudo yum install -y git python3 python3-pip
    
  • Install ansible with pip. The --user option causes it to be installed in ~ /.local, to avoid interfering with python packages provided by CentOS.

      # Note: do not use sudo here.
      python3 -m pip install --user --upgrade pip
      python3 -m pip install --user ansible==4.5.0
    
  • Disable SELinux and reboot:

      sudo sed -i s/SELINUX=enforcing/SELINUX=disabled/ /etc/selinux/config
      sudo shutdown -r now
    

Note

If you feel SELinux is important to the security of your server, please start a conversation with us at help@iri.columbia.edu.

Create a configuration repository

  • Log back into the server after rebooting. Create a git repository to track your Data Library configuration. At IRI we call ours dlconfig.

      mkdir dlconfig
      cd dlconfig
      git init
    
  • Inside the new git repository, install the IRIDL ansible collection and dependencies:

      ansible-galaxy collection install \
          -p . \
          git+https://github.com/iridl/iridl-ansible.git \
          community.docker
    
  • The previous command should have downloaded the collection to a subdirectory called ansible_collections. Add and commit that directory to your git repository, to ensure that you will use the same version of the collection every time you run the playbook.

      git add ansible_collections
      git commit -m "add iridl ansible collection"
    
  • Copy template configuration files from the collection to the top level of the repository:

      cp ansible_collections/iridl/iridl/example/* .
    
  • Move secrets.yaml out of the git repository. For security reasons, unencrypted secrets should not be committed to version control.

      mv secrets.yaml ..
    
  • Modify playbook.yaml and ../secrets.yaml to customize them to the specifics of your site. The files that you copied contain example configuration values that should be replaced with real email addresses, usernames, etc. The files include comments that explain the purpose of each configuration option. If you are not ready to set up your real Data Library server but merely want to practice the installation process, e.g. in a virtual machine, you can use the example files without modification.

  • Commit your customizations and push them to your git server for safe keeping; back up secrets.yaml by other means, such as copying it to another machine.

Never edit the contents of the ansible_collections directory. All customization should be made in the configuration files that you copied from the template. In the future when it comes time to upgrade to a newer version of the DL software, you will run the ansible-galaxy command again and commit the new version to your configuration repository. Don’t upgrade without checking the release notes first, because in some cases an upgrade may require manual migration steps. (At this writing, there are no upgrade release notes because this is the playbook’s initial release.)

Run the ansible playbook

Now we are ready to run the playbook, which will download, configure, and install the Data Library software using the parameters you defined in the configuration files.

From the root directory of the configuration repository, run the following command:

ansible-playbook \
   --ask-become-pass \
   -i inventory.cfg \
   -e @../secrets.yaml \
   -e run_update_script=yes \
   playbook.yaml

Each step of the installation will be printed to the terminal. At a site with a fast connection to the internet, the playbook generally finishes within ten minutes, but if bandwidth is limited it may take a few hours, as the installation process involves downloading several GB of software packages and container images.

You should now be able to visit your Data Library server in a browser, but the maprooms are not yet functional because the data that underlies them has yet to be installed.

Install datasets

Among other things, the ansible playbook has created structures (directories, groups, a database, and permissions) to support the installation of datasets. You can now install your data as described in Installing data. A member of the IRI staff will typically be involved in this process, as it may involve copying large amounts of data from an IRI server to yours.

Use your new Data Library

You should now be able to visit your Data Library server in a browser. For next steps, see the Maintenance page of the current guide, and the User Guide.