Installation and Configuration

Installation

Python 3.7 or later is required. Gravity can be installed independently of Galaxy, but it is also a dependency of Galaxy since Galaxy 22.01. If you’ve installed Galaxy, then Gravity is already installed in Galaxy’s virtualenv.

To install independently:

$ pip install gravity

To make your life easier, you are encourged to install into a virtualenv. The easiest way to do this is with Python’s built-in venv module:

$ python3 -m venv ~/gravity
$ . ~/gravity/bin/activate

Configuration

Gravity needs to know where your Galaxy configuration file is, and depending on your Galaxy layout, some additional details like the paths to its virtualenv and root directory. By default, Gravity’s configuration is defined in Galaxy’s configuration file (galaxy.yml) to be easy and familiar for Galaxy administrators. Gravity’s configuration is defined underneath the gravity key, and Galaxy’s configuration is defined underneath the galaxy key. For example:

---
gravity:
  gunicorn:
    bind: localhost:8192
galaxy:
  database_connection: postgresql:///galaxy

Configuration Search Paths

If you run galaxy or galaxyctl from the root of a Galaxy source checkout and do not specify the config file option, config/galaxy.yml or config/galaxy.yml.sample will be automatically used. To avoid having to run from the Galaxy root directory or to work with a config file in a different location, you can explicitly point Gravity at your Galaxy configuration file with the --config-file (-c) option or the $GRAVITY_CONFIG_FILE (or $GALAXY_CONFIG_FILE, as set by Galaxy’s run.sh script) environment variable. Then it’s possible to run the galaxyctl command from anywhere.

Often times it’s convenient to put the environment variable in the Galaxy user’s shell environment file, e.g.:

$ echo "export GRAVITY_CONFIG_FILE='/srv/galaxy/config/galaxy.yml'" >> ~/.bash_profile

When running Gravity as root, the following configuration files will automatically be searched for and read, unless --config-file is specified or $GRAVITY_CONFIG_FILE is set:

  • /etc/galaxy/gravity.yml
  • /etc/galaxy/galaxy.yml
  • /etc/galaxy/gravity.d/*.y(a?)ml

Splitting Gravity and Galaxy Configurations

For more advanced deployments, it is not necessary to write your entire Galaxy configuration to the Gravity config file. You can write only the Gravity configuration, and then point to your Galaxy config file with the galaxy_config_file option in the Gravity config. This can be useful for cases such as your Galaxy server being split across multiple hosts.

For example, on a deployment where the web (gunicorn) and job handler processes run on different hosts, one might have:

In gravity.yml on the web host:

---
gravity:
  galaxy_config_file: galaxy.yml
  log_dir: /var/log/galaxy
  gunicorn:
    bind: localhost:8888
  celery:
    enable: false
    enable_beat: false

In gravity.yml on the job handler host:

---
gravity:
  galaxy_config_file: galaxy.yml
  log_dir: /var/log/galaxy
  gunicorn:
    enable: false
  celery:
    enable: true
    enable_beat: true
  handlers:
    handler:
      processes: 2

See the Managing Multiple Galaxies section for additional examples.

Configuration Options

The following options in the gravity section of galaxy.yml can be used to configure Gravity:

# Configuration for Gravity process manager.
# ``uwsgi:`` section will be ignored if Galaxy is started via Gravity commands (e.g ``./run.sh``, ``galaxy`` or ``galaxyctl``).
gravity:

  # Process manager to use.
  # ``supervisor`` is the default process manager when Gravity is invoked as a non-root user.
  # ``systemd`` is the default when Gravity is invoked as root.
  # Valid options are: supervisor, systemd
  # process_manager:

  # What command to write to the process manager configs
  # `gravity` (`galaxyctl exec <service-name>`) is the default
  # `direct` (each service's actual command) is also supported.
  # Valid options are: gravity, direct
  # service_command_style: gravity

  # Use the process manager's *service instance* functionality for services that can run multiple instances.
  # Presently this includes services like gunicorn and Galaxy dynamic job handlers. Service instances are only supported if
  # ``service_command_style`` is ``gravity``, and so this option is automatically set to ``false`` if
  # ``service_command_style`` is set to ``direct``.
  # use_service_instances: true

  # umask under which services should be executed. Setting ``umask`` on an individual service overrides this value.
  # umask: '022'

  # Memory limit (in GB), processes exceeding the limit will be killed. Default is no limit. If set, this is default value
  # for all services. Setting ``memory_limit`` on an individual service overrides this value. Ignored if ``process_manager``
  # is ``supervisor``.
  # memory_limit:

  # Specify Galaxy config file (galaxy.yml), if the Gravity config is separate from the Galaxy config. Assumed to be the
  # same file as the Gravity config if a ``galaxy`` key exists at the root level, otherwise, this option is required.
  # galaxy_config_file:

  # Specify Galaxy's root directory.
  # Gravity will attempt to find the root directory, but you can set the directory explicitly with this option.
  # galaxy_root:

  # User to run Galaxy as, required when using the systemd process manager as root.
  # Ignored if ``process_manager`` is ``supervisor`` or user-mode (non-root) ``systemd``.
  # galaxy_user:

  # Group to run Galaxy as, optional when using the systemd process manager as root.
  # Ignored if ``process_manager`` is ``supervisor`` or user-mode (non-root) ``systemd``.
  # galaxy_group:

  # Set to a directory that should contain log files for the processes controlled by Gravity.
  # If not specified defaults to ``<galaxy_data_dir>/gravity/log``.
  # log_dir:

  # Set to Galaxy's virtualenv directory.
  # If not specified, Gravity assumes all processes are on PATH. This option is required in most circumstances when using
  # the ``systemd`` process manager.
  # virtualenv:

  # Select the application server.
  # ``gunicorn`` is the default application server.
  # ``unicornherder`` is a production-oriented manager for (G)unicorn servers that automates zero-downtime Galaxy server restarts,
  # similar to uWSGI Zerg Mode used in the past.
  # Valid options are: gunicorn, unicornherder
  # app_server: gunicorn

  # Override the default instance name.
  # this is hidden from you when running a single instance.
  # instance_name: _default_

  # Configuration for Gunicorn. Can be a list to run multiple gunicorns for rolling restarts.
  gunicorn:

    # Enable Galaxy gunicorn server.
    # enable: true

    # The socket to bind. A string of the form: ``HOST``, ``HOST:PORT``, ``unix:PATH``, ``fd://FD``. An IP is a valid HOST.
    # bind: localhost:8080

    # Controls the number of Galaxy application processes Gunicorn will spawn.
    # Increased web performance can be attained by increasing this value.
    # If Gunicorn is the only application on the server, a good starting value is the number of CPUs * 2 + 1.
    # 4-12 workers should be able to handle hundreds if not thousands of requests per second.
    # workers: 1

    # Gunicorn workers silent for more than this many seconds are killed and restarted.
    # Value is a positive number or 0. Setting it to 0 has the effect of infinite timeouts by disabling timeouts for all workers entirely.
    # If you disable the ``preload`` option workers need to have finished booting within the timeout.
    # timeout: 300

    # Extra arguments to pass to Gunicorn command line.
    # extra_args:

    # Use Gunicorn's --preload option to fork workers after loading the Galaxy Application.
    # Consumes less memory when multiple processes are configured. Default is ``false`` if using unicornherder, else ``true``.
    # preload:

    # umask under which service should be executed
    # umask:

    # Value of supervisor startsecs, systemd TimeoutStartSec
    # start_timeout: 15

    # Value of supervisor stopwaitsecs, systemd TimeoutStopSec
    # stop_timeout: 65

    # Amount of time to wait for a server to become alive when performing rolling restarts.
    # restart_timeout: 300

    # Memory limit (in GB). If the service exceeds the limit, it will be killed. Default is no limit or the value of the
    # ``memory_limit`` setting at the top level of the Gravity configuration, if set. Ignored if ``process_manager`` is
    # ``supervisor``.
    # memory_limit:

    # Extra environment variables and their values to set when running the service. A dictionary where keys are the variable
    # names.
    # environment: {}

  # Configuration for Celery Processes.
  celery:

    # Enable Celery distributed task queue.
    # enable: true

    # Enable Celery Beat periodic task runner.
    # enable_beat: true

    # Number of Celery Workers to start.
    # concurrency: 2

    # Log Level to use for Celery Worker.
    # Valid options are: DEBUG, INFO, WARNING, ERROR
    # loglevel: DEBUG

    # Queues to join
    # queues: celery,galaxy.internal,galaxy.external

    # Pool implementation
    # Valid options are: prefork, eventlet, gevent, solo, processes, threads
    # pool: threads

    # Extra arguments to pass to Celery command line.
    # extra_args:

    # umask under which service should be executed
    # umask:

    # Value of supervisor startsecs, systemd TimeoutStartSec
    # start_timeout: 10

    # Value of supervisor stopwaitsecs, systemd TimeoutStopSec
    # stop_timeout: 10

    # Memory limit (in GB). If the service exceeds the limit, it will be killed. Default is no limit or the value of the
    # ``memory_limit`` setting at the top level of the Gravity configuration, if set. Ignored if ``process_manager`` is
    # ``supervisor``.
    # memory_limit:

    # Extra environment variables and their values to set when running the service. A dictionary where keys are the variable
    # names.
    # environment: {}

  # Configuration for gx-it-proxy.
  gx_it_proxy:

    # Set to true to start gx-it-proxy
    # enable: false

    # gx-it-proxy version
    # version: '>=0.0.5'

    # Public-facing IP of the proxy
    # ip: localhost

    # Public-facing port of the proxy
    # port: 4002

    # Routes file to monitor.
    # Should be set to the same path as ``interactivetools_map`` in the ``galaxy:`` section. This is ignored if
    # ``interactivetools_map is set``.
    # sessions: database/interactivetools_map.sqlite

    # Include verbose messages in gx-it-proxy
    # verbose: true

    # Forward all requests to IP.
    # This is an advanced option that is only needed when proxying to remote interactive tool container that cannot be reached through the local network.
    # forward_ip:

    # Forward all requests to port.
    # This is an advanced option that is only needed when proxying to remote interactive tool container that cannot be reached through the local network.
    # forward_port:

    # Rewrite location blocks with proxy port.
    # This is an advanced option that is only needed when proxying to remote interactive tool container that cannot be reached through the local network.
    # reverse_proxy: false

    # umask under which service should be executed
    # umask:

    # Value of supervisor startsecs, systemd TimeoutStartSec
    # start_timeout: 10

    # Value of supervisor stopwaitsecs, systemd TimeoutStopSec
    # stop_timeout: 10

    # Memory limit (in GB). If the service exceeds the limit, it will be killed. Default is no limit or the value of the
    # ``memory_limit`` setting at the top level of the Gravity configuration, if set. Ignored if ``process_manager`` is
    # ``supervisor``.
    # memory_limit:

    # Extra environment variables and their values to set when running the service. A dictionary where keys are the variable
    # names.
    # environment: {}

  # Configuration for tusd server (https://github.com/tus/tusd).
  # The ``tusd`` binary must be installed manually and made available on PATH (e.g in galaxy's .venv/bin directory).
  tusd:

    # Enable tusd server.
    # If enabled, you also need to set up your proxy as outlined in https://docs.galaxyproject.org/en/latest/admin/nginx.html#receiving-files-via-the-tus-protocol.
    # enable: false

    # Path to tusd binary
    # tusd_path: tusd

    # Host to bind the tusd server to
    # host: localhost

    # Port to bind the tusd server to
    # port: 1080

    # Directory to store uploads in.
    # Must match ``tus_upload_store`` setting in ``galaxy:`` section.
    # upload_dir:

    # Comma-separated string of enabled tusd hooks.
    #
    # Leave at the default value to require authorization at upload creation time.
    # This means Galaxy's web process does not need to be running after creating the initial
    # upload request.
    #
    # Set to empty string to disable all authorization. This means data can be uploaded (but not processed)
    # without the Galaxy web process being available.
    #
    # You can find a list of available hooks at https://github.com/tus/tusd/blob/master/docs/hooks.md#list-of-available-hooks.
    # hooks_enabled_events: pre-create

    # Extra arguments to pass to tusd command line.
    # extra_args:

    # umask under which service should be executed
    # umask:

    # Value of supervisor startsecs, systemd TimeoutStartSec
    # start_timeout: 10

    # Value of supervisor stopwaitsecs, systemd TimeoutStopSec
    # stop_timeout: 10

    # Memory limit (in GB). If the service exceeds the limit, it will be killed. Default is no limit or the value of the
    # ``memory_limit`` setting at the top level of the Gravity configuration, if set. Ignored if ``process_manager`` is
    # ``supervisor``.
    # memory_limit:

    # Extra environment variables and their values to set when running the service. A dictionary where keys are the variable
    # names.
    # environment: {}

  # Configuration for Galaxy Reports.
  reports:

    # Enable Galaxy Reports server.
    # enable: false

    # Path to reports.yml, relative to galaxy.yml if not absolute
    # config_file: reports.yml

    # The socket to bind. A string of the form: ``HOST``, ``HOST:PORT``, ``unix:PATH``, ``fd://FD``. An IP is a valid HOST.
    # bind: localhost:9001

    # Controls the number of Galaxy Reports application processes Gunicorn will spawn.
    # It is not generally necessary to increase this for the low-traffic Reports server.
    # workers: 1

    # Gunicorn workers silent for more than this many seconds are killed and restarted.
    # Value is a positive number or 0. Setting it to 0 has the effect of infinite timeouts by disabling timeouts for all workers entirely.
    # timeout: 300

    # URL prefix to serve from.
    # The corresponding nginx configuration is (replace <url_prefix> and <bind> with the values from these options):
    #
    # location /<url_prefix>/ {
    #     proxy_pass http://<bind>/;
    # }
    #
    # If <bind> is a unix socket, you will need a ``:`` after the socket path but before the trailing slash like so:
    #     proxy_pass http://unix:/run/reports.sock:/;
    # url_prefix:

    # Extra arguments to pass to Gunicorn command line.
    # extra_args:

    # umask under which service should be executed
    # umask:

    # Value of supervisor startsecs, systemd TimeoutStartSec
    # start_timeout: 10

    # Value of supervisor stopwaitsecs, systemd TimeoutStopSec
    # stop_timeout: 10

    # Memory limit (in GB). If the service exceeds the limit, it will be killed. Default is no limit or the value of the
    # ``memory_limit`` setting at the top level of the Gravity configuration, if set. Ignored if ``process_manager`` is
    # ``supervisor``.
    # memory_limit:

    # Extra environment variables and their values to set when running the service. A dictionary where keys are the variable
    # names.
    # environment: {}

  # Configure dynamic handlers in this section.
  # See https://docs.galaxyproject.org/en/latest/admin/scaling.html#dynamically-defined-handlers for details.
  # handlers: {}

Galaxy Job Handlers

Gravity has support for reading Galaxy’s job configuration: it can read statically configured job handlers in the job_conf.yml or job_conf.xml files, or the job configuration inline from the job_config option in galaxy.yml. However, unless you need to statically define handlers, it is simpler to configure Gravity to run dynamically defined handlers as detailed in the Galaxy scaling documentation.

When using dynamically defined handlers, be sure to explicitly set the job handler assignment method to db-skip-locked or db-transaction-isolation to prevent the web process from also handling jobs.

Gravity State

Older versions of Gravity stored a considerable amount of config state in $GRAVITY_STATE_DIR/configstate.yaml. As of version 1.0.0, Gravity does not store state information, and this file can be removed if left over from an older installation.

Although Gravity no longer uses the config state file, it does still use a state directory for storing supervisor configs, the default log directory (if log_dir is unchanged), and the celery-beat database. This directory defaults to <galaxy_root>/database/gravity/ by way of the data_dir option in the galaxy section of galaxy.yml (which defaults to <galaxy_root>/database/).

If running multiple Galaxy servers with the same Gravity configuration as described in Managing Multiple Galaxies and if doing so using supervisor rather than systemd, the supervisor configurations will be stored in $XDG_CONFIG_HOME/galaxy-gravity ($XDG_CONFIG_HOME defaults to ~/.config/galaxy-gravity)

In any case, you can override the path to the state directory using the --state-dir option, or the $GRAVITY_STATE_DIR environment variable.

Note

Galaxy 22.01 and 22.05 automatically set $GRAVITY_STATE_DIR to <galaxy_root>/database/gravity in the virtualenv’s activation script, activate. This can be removed from the activate script when using Gravity 1.0.0 or later.