Installation

This section provides instructions for installing the Data Library software on a server.

Installation of the Data Library software is automated using ansible, a configuration management tool.

Prepare the server

Before running the installation script, perform the following steps to prepare the server:

  • Install CentOS 7 (minimal server configuration). For more information on this choice of operating system, see Infrastructure.

  • Create a user account for the system administrator who will be performing the installation. Make that user a member of the wheel group so that they will be able to perform commands as root using sudo.

  • Mount a volume with at least 1TB of storage space, preferably with mirror RAID, at /data. List the volume in /etc/fstab to ensure that it will be mounted at boot time. We recommend using LVM to create logical volumes, and formatting the volume with XFS. Note that an XFS filesystem can be expanded but not shrunk, so it may be preferable to leave some disk space unallocated, to be used for snapshots or unanticipated storage needs, rather than putting all of the available space into the XFS-formatted volume.

  • Install git and python3:

      sudo yum install -y git python3
    
  • Install ansible with pip. The --user option causes it to be installed in ~/.local, to avoid interfering with python packages provided by CentOS.

      # Note: do not use sudo here.
      python3 -m pip install --user --upgrade pip
      python3 -m pip install --user ansible==4.5.0
    
  • Disable SELinux and reboot:

      sudo sed -i s/SELINUX=enforcing/SELINUX=disabled/ /etc/selinux/config
      sudo shutdown -r now
    

Note

If you feel SELinux is important to the security of your server, please start a conversation with us at help@iri.columbia.edu.

Create a configuration repository

  • Log back into the server after rebooting. Create a git repository to track your Data Library configuration. At IRI we call ours dlconfig.

      mkdir dlconfig
      cd dlconfig
      git init
    
  • Inside the new git repository, install the IRIDL ansible collection:

      ansible-galaxy collection install \
          -p . \
          git+https://github.com/iridl/iridl-ansible.git
    
  • The previous command should have downloaded the collection to a subdirectory called ansible_collections. Add and commit that directory to your git repository, to ensure that you will use the same version of the collection every time you run the playbook.

      git add ansible_collections
      git commit -m "add iridl ansible collection"
    
  • Copy template configuration files from the collection to the top level of the repository:

      cp ansible_collections/iridl/iridl/example/* .
    
  • Move secrets.yaml out of the git repository. For security reasons, unencrypted secrets should not be committed to version control.

      mv secrets.yaml ..
    
  • Modify playbook.yaml and ../secrets.yaml to customize them to the specifics of your site. The files that you copied contain example configuration values that should be replaced with real email addresses, user names, etc. The files include comments that explain the purpose of each configuration option. If you are not ready to set up your real Data Library server but merely want to practice the installation process, e.g. in a virtual machine, you can use the example files without modification.

  • Commit your customizations and push them to your git server for safe keeping; back up secrets.yaml by other means, such as copying it to another machine.

Never edit the contents of the ansible_collections directory. All customization should be made in the configuration files that you copied from the template. In the future when it comes time to upgrade to a newer version of the DL software, you will run the ansible-galaxy command again and commit the new version to your configuration repository. Don’t upgrade without checking the release notes first, because in some cases an upgrade may require manual migration steps. (At this writing, there are no upgrade release notes because this is the playbook’s initial release.)

Run the ansible playbook

Now we are ready to run the playbook, which will download, configure, and install the Data Library software using the parameters you defined in the configuration files.

From the root directory of the configuration repository, run the following command:

ansible-playbook \
    --ask-become-pass \
    -i inventory.cfg \
    -e @../secrets.yaml \
    -e run_update_script=yes \
    playbook.yaml

Each step of the installation will be printed to the terminal. At a site with a fast connection to the internet, the playbook generally finishes within ten minutes, but if bandwidth is limited it may take a few hours, as the installation process involves downloading several GB of software packages and container images.

You should now be able to visit your Data Library server in a browser, but the maprooms are not yet functional because the data that underlies them has yet to be installed.

Install datasets

Among other things, the ansible playbook has created structures (directories, groups, a database, and permissions) to support the installation of datasets. You can now install your data as described in Installing data. A member of the IRI staff will typically be involved in this process, as it may involve copying large amounts of data from an IRI server to yours.

Use your new Data Library

You should now be able to visit your Data Library server in a browser. For next steps, see the Maintenance page of the current guide, and the User Guide.