r5 - 04 Sep 2019 - 10:18:49 - AlvaroFernandezYou are here: TWiki >  Informatica Web  >  ScientificComputing > IficCluster

IFIC Grid-CSIC Cluster Usage Guide

logo-grid-csic.jpg |

Introduction

IFIC cluster uses the infrastructure provided by Grid-CSIC project, and provides the computing, networking and storage infrastructure for running complex scientific batch jobs for general IFIC users.

In these pages we introduce the hardware, working environment and usage recipes for the final users of this infrastructure.

Infrastructure

Basically the nodes of the infrastructure are divided in three classes:

  • User Interfaces (UI): the entry point for users, provide a working environment where to compile and test their programs. When the jobs are ready, users can submit their production batch jobs with Job Management System to the Worker Nodes.
  • Worker Nodes (WN): where the production user jobs are executed.
    • 125 nodes ( 1000 cores)
            2xQuad Core Xeon E5420 @ 2.50GHz
      16 GB de memoria RAM
      2xHD SAS 134 GB RAID0
  • Storage Nodes: disk servers to store user and project data, that are accessible from both User Interfaces and Worker Nodes.

Operating System is Centos7 (RHEL7).

Access to the resources

  • Access Procedure : follow the steps fo fill the needed form to ask for an account on the User Interfaces. Digital certificate is not strictly needed for running on the IFIC cluster.

Working environment

  • Users access the User Interface (UI) nodes where develop and submit production jobs. These nodes contain a complete development environment where to test and validate their programs.

  • After validation, users can submit their codes as production batch jobs using the Job Management System HTCondor. This provides access to the computing cluster, and computing nodes with longer execution times are allowed.

User Interface (UI)

User Interfaces are development nodes, where users can test and validate their programs. After validation, can submit their codes as production batch jobs using the Job Management System HTCondor.

Authorized users can log in with ssh protocol the the User Interface machines:

ui03.ific.uv.es User Interface machine
ui04.ific.uv.es User Interface machine
ui05.ific.uv.es User Interface machine

The current list of UI machines and detailed hardware configuration can be found in Infrastructure?.

The user HOME directory resides in Lustre filesystem ( See section Storage )

/lhome/ific/<initial_letter>/<username>   <= for IFIC users
/lhome/<groupid>/<username>  <= for external users

In addition of the user HOME directory there is space for projects available upon request, and accessible also from the UI:

/lustre/ific.uv.es/prj/ific/<groupid>

For IFIC users with account in the AFS filesystem, their desktop home directory is accessible in the following path:

/afs/ific.uv.es/user/<initial_letter>/<username>
But you have to obtain a Kerberos ticket with the following commands:
              $ klog
   or
              $ kinit
              $ aklog 

Software is accesible in the User Interfaces as described in this Software section.

Job Management System: HTCondor

HTCondor is the resource management system that runs in this cluster. It manages the job workflow and allows the users to send jobs to be executed in the worker nodes. Direct access to worker nodes is not allowed.

Each worker node has a partitionable slot that accepts jobs to be processed. HTCondor deals with job sorting and processing. Slots are divided when the job does not require all node resources, so more jobs can be run in the node. CPU and Memory resources are subtracted in chunks from the main slot.

HTCondor tries to run jobs form different users in a fair share way. Jobs priorities among users take into account the previous time spent by the user so CPU time is assigned evenly between all users.

The complete HTCondor manual can be found here

A usage guide can be found here.

Storage

Storage is mantained in several disk servers as detailed in Infrastructure?.

A distributed Lustre filesystem is shared and mounted in the different nodes of the cluster, including User Interfaces (UI) and Worker Nodes.

This means that all data is directly available in all nodes, and no explicit file transfer is needed to be accessible from the worked nodes.

This includes user homes and project areas.

Containers

Containers are a software distribution form, very convenient for developers and users.

We support Singularity as it is secure and supports several container types including Docker and access to the DockerHUB?.

The current distribution documentation for users can be found here.

Example: download the latest tensorflow nightly gpu container from the docker hub, and convert it into a singularity image for later use:

$ mkdir ~/s.images
$ cd ~/s.images
$ singularity build tensorflow-nightly-gpu docker://tensorflow/tensorflow:nightly-gpu

HEP Scientific Software

CVMFS: HEP Software distribution

We adopt CVMFS as the main HEP software distribution method. The software packages are distributed in differente repositores maintained by the different contributors, and accessible as local mounted /cvmfs points in User Interfaces (UI) and Worker Nodes.

The current repositories that can be found are the following:

CERN/SFT Repositories
External software packages are taken from external sources to PH/SFT. They are recompiled, if possible and necessary, on all SFT provided platforms. External software packages are provided for many different areas such as
  • General tools (debugging, testing)
  • Graphics
  • Mathematical Libraries
  • Databases
  • Scripting Languages and modules
  • Grid middleware
  • Compilers

Please review the README info at /cvmfs/sft.cern.ch/README :

LCG RELEASES
============

Welcome to the LCG Releases provided by the SPI team in EP-SFT at CERN.

In this CVMFS repository `/cvmfs/sft.cern.ch` you can find a software stack containing over 450 external 
packages as well as HEP specific tools and generators. There are usually two releases per year as well
as development builds every night.

The releases start with the prefix `LCG_` followed by the major version of the release, e.g. `96`. A
major release implies major version changes in all packages of the software stack. For patches, we
append lowercase letters like `a`, `b`, ... to the name of the release.

**Example:** The release 96 provides the following version:

- A release candidate used for testing an upcoming major release: `LCG_96rc1`
- A new major release: `LCG_96`
- A patch release (only if necessary) to fix problems in this major release: `LCG_96a`

These versions are based on Python 2.x. For most versions we also provide a Python 3.x build e.g.
`LCG_96python3`. For the Nightlies we provide the following configurations:

- `dev4` Based on the latest stable version of ROOT and Python 2.x
- `dev3` Based on the latest git HEAD of ROOT and Python 2.x (called 'Bleeding Edge' in SWAN)
- `dev3python3` Based on the latest git HEAD of ROOT and Python 3.x (called 'Bleeding Edge Python 3')

There are also builds with CUDA support and specific SIMD instruction sets.

...

And the USAGE section on the same file:

USAGE
-----

For most of our distributed software you can find both a Bash script (ending in `.sh`) as well as a
C Shell script (ending in `.csh`). By using the `source` command in your local shell with these
scripts you can change your current shell environment to use the software on CVMFS instead of your
locally installed packages. These changes only last until you close the current terminal session.

In this section we introduce the most used features of the LCG Releases in `/cvmfs/sft.cern.ch/lcg/`:

- `contrib`
  - Used for compilers, CMake and other build tools
  - We provide and maintain `gcc`, `clang`, `CMake`. The compilers come bundled with `binutils`.
  - The other subfolders in `contrib` are not as well maintained as the three mentioned above
  - **Example:** Use the latest stable GCC 9 on a CentOS 7 machine in a Bash shell:  
    `source /cvmfs/sft.cern.ch/lcg/contrib/gcc/9/x86_64-centos7/setup.sh`
- `views`
  - You can make all 450+ packages of the LCG software stack available in your current shell
    without any installations
  - The setup takes a couple of seconds in the CERN network. Outside it might take a bit longer.
  - For the nightly builds, there's a `latest` symlink that links to the latest stable build
  - **Example:** Use the LCG release 96 built with GCC 8 on a CentOS 7 machine in a Bash shell:  
    `source /cvmfs/sft.cern.ch/lcg/views/LCG_96/x86_64-centos7-gcc8-opt/setup.sh`
  - **Example:** Use the Nightly *dev4* built with Clang 8 on a CentOS 7 machine in a Bash shell:  
    `source /cvmfs/sft.cern.ch/lcg/views/dev4/latest/x86_64-centos7-clang8-opt/setup.sh`
- `releases`
  - If you only need a specific package (including its dependencies) you can also do that
  - **Example:** Use only `Geant4` from the LCG_96 release for CentOS 7 and Clang 8 in a Bash shell:  
    `source /cvmfs/sft.cern.ch/lcg/releases/LCG_96/Geant4/10.05.p01/x86_64-centos7-clang8-opt/Geant4-env.sh`
- `nightlies`
  - Same as `releases` for the nightly builds: Use single packages instead of an entire view
  - **Example:** Use only `rootpy` from Monday's dev3 Nightly for CentOS 7 and GCC 9 in a Bash shell:  
    `source /cvmfs/sft.cern.ch/lcg/nightlies/dev3/Mon/rootpy/1.0.1/x86_64-centos7-gcc9-opt/rootpy-env.sh`

An exhaustive list of all provided packages and the supported platforms is available at http://lcginfo.cern.ch.

The 'lcgenv' configuration tool can be used to set the environment to the desired tool https://gitlab.cern.ch/GENSER/lcgenv

IE: The following example sets the environment variables needed to use ROOT, GSL and BOOST libraries:
export LCGENV_PATH=/cvmfs/sft.cern.ch/lcg/releases/
eval "` $LCGENV_PATH/lcgenv/latest/lcgenv -p LCG_93 x86_64-slc6-gcc62-opt ROOT `"
eval "` $LCGENV_PATH/lcgenv/latest/lcgenv -p LCG_93 x86_64-slc6-gcc62-opt GSL `"
eval "` $LCGENV_PATH/lcgenv/latest/lcgenv -p LCG_93 x86_64-slc6-gcc62-opt Boost`"

Other Repositories

The list of other CERN cvmfs repositories maintained by their respective owners and available in the following mount points, as detailed in CVMFS repositories list:

/cvmfs/atlas.cern.ch
/cvmfs/lhcb.cern.ch

Local Installed software

  • Compilers: python2.7, python3.6, gcc4.8.5

-- AlvaroFernandez - 02 Jul 2019

Edit | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r5 < r4 < r3 < r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback