r69 - 03 Mar 2016 - 18:12:23 - Main.fioriniYou are here: TWiki >  Atlas Web  >  TileCal > CERNComputerCluster

CERN Computer Cluster

The Valencia Computer Cluster

The TileCal Valencia computer cluster at CERN is located in building 175. It is directly accessible from within CERN's General Public Network. Remote access has to be done through the lxplus service.

local > ssh -X login@lxplus.cern.ch
lxplus > ssh -X login@valticalXX

Link to IFIC T3 computing cluster:

https://twiki.ific.uv.es/twiki/bin/view/Atlas/IFICT3Cluster

Link to user Questioning Area:

https://twiki.ific.uv.es/twiki/bin/view/Atlas/UserQuestions

Computers

Computer Activity Cores Mem Local Disk OS
Valtical01 TDAQ, analysis 4 6 GB 160 GB SLC6
Valtical03 TDAQ, analysis 2 4 GB 160GB SLC6
Valtical Xrootd pool director , proof master, MYSQL server 4 6 GB 300 GB SLC6
Valtical00 Xrootd Disk server, condor slave, proof WN 16 24 GB 12.6 TB SLC6
Valtical04 Xrootd Disk server,condor slave; proof WN 16 24 GB 6 TB SLC6
Valtical05 UI, NX Server, Xrootd Disk server,proof_lite, proof WN, proof submit machine,condor master, condor submit machine ,MYSQL client 24 48 GB 16 TB SLC5
Valtical06 Xrootd Disk server,condor slave; proof WN 16 24 GB 2 TB SLC6
Valtical07 condor slave; ganglia server, , NFS server for /work 16 24 GB 4 TB SLC6
Valtical08 Xrootd Disk server,condor slave; proof WN 16 24 GB 4 TB SLC6
Valtical09 Xrootd Disk server,condor slave; proof WN 16 24 GB 10 TB SLC6
valticalui01 User Interface, NFS server for /data6, MYSQL client 16 24GB 2TB SLC6
Valtical15 NFS server for /data2 & /data3 8 6 GB 1.5 TB SLC5

Cluster topology

  • All computers mount /data6 from valticalui01 as data storage for analysis
  • All analysis computers mount /localdisk locally.
  • All analysis computers use /localdisk/xrootd as xrootd file system cache.
  • All computers mount /work from valtical07. To be used for collaborative code development (No data).

  • Offline developments are located in /work/offline.
  • Online developments are located in /work/TicalOnline.

Ganglia Cluster Monitor

One can monitor the cluster load from this page:
Ganglia Monitor Link

IMPORTANT: How to delete files on xrootd

To delete files in xrootd, NEVER use rm on xrootdfs
Use instead this script /afs/cern.ch/user/l/lfiorini/public/xrdrm.sh $filename
$filename can be provided in the following formats: root://valtical.cern.ch//localdisk/... or /xrootdfs/localdisk/xrootd/...
You can create a file list with find //xrootdfs/localdisk/xrootd/directory > /tp/list.txt
and then from the command line: for f in `cat /tmp/list.txt`; do /afs/cern.ch/user/l/lfiorini/public/xrdrm.sh $f; done

How to use xls to query the xrootd storage:

  • '-s' : This will show the size of a directory
    • xls -s root://valtical.cern.ch//localdisk/xrootd/users/
  • '-l': This will show the directories and files in the given directory with their size
    • xls -l root://valtical.cern.ch//localdisk/xrootd/users/qing/
  • '-a': This will show all files in a directory and it's sub directories:
    • xls -a root://valtical.cern.ch//localdisk/xrootd/test

How to setup root in SLC5 with gcc43

32-bit

Source the script: /work/offline/common/setup32.sh

#!/bin/bash
export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/i686-slc5-gcc43-opt/root
export PATH=/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/bin:$ROOTSYS/bin:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/bin:$PATH
export LD_LIBRARY_PATH=$ROOTSYS/lib:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/mpfr/2.3.1/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/gmp/4.2.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/lib
export PYTHONPATH=$ROOTSYS/lib

64-bit

Source the script: /work/offline/common/setup.sh

#!/bin/bash
export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43/root
export PATH=/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43/bin:$ROOTSYS/bin:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/bin:$PATH
export LD_LIBRARY_PATH=$ROOTSYS/lib:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/mpfr/2.3.1/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/gmp/4.2.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43/lib
export PYTHONPATH=$ROOTSYS/lib

How to install and setup NX Client

Installation of NX Client

1.Download the three packages as root in /usr from here:

http://64.34.161.181/download/3.4.0/Linux/nxclient-3.4.0-7.i386.tar.gz

http://64.34.161.181/download/3.4.0/Linux/nxnode-3.4.0-16.i386.tar.gz

http://64.34.161.181/download/3.4.0/Linux/FE/nxserver-3.4.0-17.i386.tar.gz

2.Install the packages:

  • cd /usr
  • sudo tar zxvf nxclient-3.4.0-7.i386.tar.gz
  • sudo tar zxvf nxnode-3.4.0-16.i386.tar.gz
  • sudo tar zxvf nxserver-3.4.0-17.i386.tar.gz

3.Run the setup script for installing the NX Node and NX Server software:

  • sudo /usr/NX/scripts/setup/nxnode --install

How to setup and execute the NX Client

1. Execute the tunneling script that will create a tunnel to one machine in the lab through lxplus. Currently only valtical00 and valtical09 have the NX server running.

Execute the python script : python tunneling.sh

#!/usr/bin/env python

import os
import time
import pexpect
import sys
import getpass

user = raw_input("User:")
passw = getpass.unix_getpass("Enter your password:")

if (user,passw) != ("",""):
    print "parent thread"
    print "Connecting to lxplus"
    ssh = pexpect.spawn('ssh  -L 10001:valtical09.cern.ch:22 %s@lxplus.cern.ch'%user)
    ssh.expect('password')
    ssh.sendline(passw)
    ssh.expect('lxplus')
    ssh.interact()

Note: You need pexpect package installed in your computer.

2. Open the nxclient

In another terminal:

  • ./usr/NX/bin/nxclient

3. Configure the session. This should be done only once.
Configuration:
* Host: localhost
* Port: 10001
Write your username and password and Login. A window will appear and you will see the remote desktop.

Initialization of the NX Client for MacOS? Lion

  1. Clear your lxplus:~/.nx folder
  2. Clear your localhost:~/.ssh/known_hosts file
  3. Login interactively to lxplus and to the valtical node from there ( you will be prompted for confirmation )
  4. Create tunnel
  5. Open NX client:
    • session name: valtical
    • host: localhost
    • port: same as in tunnel script
    • type: NoMachine?

NAS Access

Instructions to access the ticalnas01.ific.uv.es server:

  1. Ask Luca to create an account for you.
  2. If you want to access the NAS by NFS, it is preferable to provide the UID of the user you want to access ticalnas01 from.
  3. Now you can access the data from SMB and NFS.
  4. If you want to mount the disk via SMB or NFS on Linux, do the following:
  5. SMB:
    • Install samba, smbclient and cifs-utils packages
    • Access interactively with: smbclient //ticalnas01.ific.uv.es/Documents -U YOUR_USERNAME
    • To mount, add the following lines to the /etc/fstab:
    • //ticalnas01.ific.uv.es:/Documents  YOUR_MOUNT_POINT       cifs     noauto,users,rw,noexec,username=YOUR_USERNAME    0       0
  6. NFS:
    • Install nfs-common package.
    • Add the following line to the /etc/fstab
    • ticalnas01.ific.uv.es:/            YOUR_MOUNT_POINT       nfs     noauto,users,rw,rsize=8192,wsize=8192,hard,intr    0       0

FAQ

Q. How to change shell to bash?

A. Change your default shell in http://cern.ch/cernaccount under Applications and Resources > Linux and AFS > Unix shell.

XROOTD FAQ

Q. How is xrootd configured in the valtical cluster?

A. Xrootd is locally installed in the cluster under /opt/vdt-xrootd/xrootd. Valtical acts as the manager/redirector and valtical00,valtical04,valtical05,valtical06,valtical07,valtical08,valtical09 act as data servers.

Q. What is the location of xrootd config files on the valtical machines?

A. The xrootd config file is the same for all nodes. Located in /opt/vdt-xrootd/xrootd/etc/xrootd.cfg

Q. What are the xrootd processes running on the manager?

A. The xrootd , Cluster Management Service daemon (cmsd) , xrootd - Composite NameSpace? daemon (xrootd-cns) , xrootdfs daemon (xrootdfsd)

Q. What are the xrootd processes running on each node?

A. The manager runs 3 instances of xrootd: one for xrootd itself, another one for cmsd and another one for xrootd-cns. The data servers run only 2 instances of xrootd for xrootd itself and cmsd only. All instances are controlled by
service xrootd start/stop
service cmsd start/stop

Q. How is xrootdfs process started/stopped on the manager (valtical)?

A. Xrootdfs is installed in /opt/vdt-xrootdfs/xrootdfs. The start stop scripts are in etc/bin. The service is controlled via: service xrootdfs start/stop

Q. Where are the xrootd log files created?

A. The location of xrootd logs is /opt/vdt-xrootd/xrootd/var/logs.

Q: Existing files suddenly disappeared or are not accessible anymore from xrootd.

A: See Troubleshoting session below.

How to list files under XROOTD

You can find the current list of files here: /data6/lfiorini/lsxrootd/contents/MergedList.txt
This list is quite handy also for your analysis, as the list is already sorted.

The scripts producing the list are run automatically by a cronjob and it is updated every hour.

XROOTD Operations

XrootdFS? provide filesystem interface to the underline storage system upon which the user space program operates.
Path to XrootdFS? directory is /xrootdfs/localdisk/xrootd.

Some of the useful functionalities:-

Read files / list directories ( Eg: cat 'filename' , ls 'dirname')
create files / create directories ( Eg: mkdir 'dirname')
remove files / remove directories ( Eg rm 'filename' , rm -rf 'dirname')

XROOTD Installation

1. Log in as root

2. Create the directory where xrootd is to be installed

  • export INSTALL_DIR=/opt/vdt-xrootd
  • mkdir $INSTALL_DIR
  • cd $INSTALL_DIR

3. Install Pacman software

4. Install xrootd package from VDT cache and run the post installation scripts.

5. Configure xrootd redirector

  • source setup.sh
  • $VDT_LOCATION/vdt/setup/configure_xrootd --server y --this-is-xrdr --storage-path /localdisk/xrootd
    --storage-cache /localdisk/xrootd --enable-security --set-data-server-xrd-port 1094 --user xrootd
    • "ERROR: hostname returns 'valtical', which does not match your fully qualified hostname 'valtical.cern.ch' The Xrootd redirector will not work unless hostname returns a FQHN." could be fixed by:
      * echo "137.138.40.146 valtical.cern.ch valtical" >> /etc/hosts
      * modify the value of 'HOSTNAME' in/etc/sysconfig/network:
      * echo "valtical.cern.ch" > /proc/sys/kernel/hostname

  • edit the xrootd.cfg file located at /opt/vdt-xrootd/xrootd/etc/xrootd.cfg

6. Configure xrootd data server

  • source setup.sh
  • $VDT_LOCATION/vdt/setup/configure_xrootd --server y --xrdr-host valtical.cern.ch --storage-path /localdisk/xrootd
    --storage-cache /localdisk/xrootd --enable-security --set-data-server-xrd-port 1094 --user xrootd
  • edit the xrootd.cfg file located at /opt/vdt-xrootd/xrootd/etc/xrootd.cfg

7. Start the xrootd service

  • vdt-control --non-root --on

EOS client installation

1. yum -y install xrootd-libs zlib readline ncurses openssl e2fsprogs-libs
2. rpm -ivh /afs/cern.ch/project/eos/rpms/slc-5-x86_64/eos-client-0.1.0-rc37.x86_64.rpm
3. yum -y install xrootd-client fuse-libs
4. rpm -ivh /afs/cern.ch/project/eos/rpms/slc-5-x86_64/eos-fuse-0.1.0-rc37.x86_64.rpm
5. export EOS_MGM_URL=root://eosatlas.cern.ch

XROOTDFS Installation

The system should have FUSE installed.FUSE can be installed using 'yum install fuse fuse-libs'.

1.Setup Pacman, make an installation directory and assign its path to $INSTALL_DIR. Install XrootdFS? from the http://software.grid.iu.edu/osg-1.2 cache.

2.Configure XrootdFS?

  • cd $INSTALL_DIR
  • source setup.sh
  • $VDT_LOCATION/vdt/setup/configure_xrootdfs --user xrootd --cache /xrootdfs/localdisk/xrootd --xrdr-host valtical.cern.ch --xrdr-storage-path /localdisk/xrootd

XROOTD Troubleshooting

Q. Error : Access to cluster files and directories via /xrootdfs/localdisk/xrootd fails
Eg : Command ls /xrootdfs/localdisk/xrootd gives error:Transport endpoint is not connected.

A. Check whether the partition /xrootdfs/localdisk/xrootd is mounted properly.If not,
Try stopping and starting xrootdfsd daemon as explained in the previous section.
If stopping xrootdfsd does not succeed,
Stop the xrootd processes
Unmount the partition using 'umount -l /xrootdfs/localdisk/xrootd'
Start xrootd processes
Start the xrootdfsd daemon

Q: Existing Files suddenly disappear from xroot

A: Most likely one of xrootd machine hosting the files is down or is hanging or the xrootd service is not running anymore on that machine. This can also happen in conjunction with the error message, when trying to list the directory with xrd valtical dirlist SOMEDIR :
Error 3011: Unable to open directory SOMEDIR; no such file or directory
In server valtical.cern.ch:1094 or in some of its child nodes.

In this case: identify the machine which is affected by the problem. Reboot the machine if needed, restart the cmsd and xrootd services if needed. Beside xrootd, cmsd is also needed, otherwise the files will not be accessible:
service cmsd restart
service xrootd restart

Q. Error in listing contents of the directories corresponding to xrootdfs in valtical.cern.ch

A. This error happens when xrootd - cns daemon is no longer running on valtical.cern.ch. Check whether the xrootd - cns process is running on valtical.cern.ch.

Q. Error : Unable to write to xrootd cluster. Error message :Last server error 3011 ('No servers are available to write the file.')

A. Check whether there is enough disk space available in the xrootd cluster.

Q. Error : Xrootd runs on the redirector and data servers.But there is no communication between the redirector and data server.

A. Add rules to iptables to accept incoming tcp connections from xrootd.

Q. Error: Connection timeout

A. Make sure xrootdfs is correctly configured in /opt/vdt-xrootdfs/xrootdfs/bin/start.sh

CONDOR

Central Manager : Valtical00
Deamons running : Master,Negotiator,Collector,Procd

Submit Machine : Valtical00
Deamons running : Master,Schedd,Procd

Execute Machine :Valtical04 - Valtical09
Deamons running : Master,Startd,Procd

CONDOR CONFIGURATION

Global configuration file : /opt/condor-7.5.0.bak/etc/condor/condor_config
Local configuration file : //opt/condor-7.5.0.bak/etc/condor/condor_config.local
The configuration values can be obtained by querying using 'condor_config_val -v variable' command or by referring the config files
The config values set in the global config file is overridden by the values in the local config file.

The condor_reconfig command can be used for configuration changes to take effect after configuration file changes.

The changes to some configuration parameters(eg : DAEMON_LIST) take effect only on restarting the master daemon using command : condor_restart

valtical00 is set to 1 MAX_NUM_CPUS as it is the master of the batch system valticalXX are set to 12 MAX_NUM_CPUS in order to release the load during peak periods.

CONDOR INSTALLATION and TESTING In such way for valtical00,04,05,06,07,08,09
Note: "root" user must be used to perform installation

1. stop all the running condor daemons
* service condor stop
* pkill -f condor

2. find and remove the old condor rpm package
* rpm -qa | grep condor
* rpm -e old_condor_version

3. make a back up of previous package, in this way the *.rpmsave file will also be invalidated.
* mv /opt/condor-7.5.0 /opt/condor-7.5.0.bak

4.Download and install the YUM repository file that matches your operating systems
Installing latest condor for Redhat 5

5.Install Yum's downloadonly module in order to download RPM from repository

  • yum install yum-downloadonly

6.Download Condor RPM from yum repository to temporary folder
Installation for 64 bit machine

  • yum install condor.x86_64 --downloadonly --downloaddir=/tmp

7.Install RPM into /opt/condor-7.5.0 folder

  • rpm -ivh /tmp/condor-7.5.1-1.rhel5.i386.rpm \
    --relocate /usr=/opt/condor-7.5.0/usr \
    --relocate /var=/opt/condor-7.5.0/var \
    --relocate /etc=/opt/condor-7.5.0/etc
  • if the previous command failed due to missing rpm sources, install them
    * yum install libvirt
    * yum install perl-XML-Simple-2.14-4.fc6.noarch

8.Edit Condor's configuration files condor_config.local,condor_config located
and paths in condor start script.
* cd /opt/condor-7.5.0/etc/condor/
* cp /opt/condor-7.5.0.bak/etc/condor/condor_config ./
* cp /opt/condor-7.5.0.bak/etc/condor/condor_config.local ./

9. Add the following 2 lines to /etc/profile.d/condor.sh if necessary and then source the .sh file
* export PATH=${PATH}:/opt/condor-7.5.0/usr/bin:/opt/condor-7.5.0/usr/sbin:/sbin
* export CONDOR_CONFIG=/opt/condor-7.5.0/etc/condor/condor_config

10 .Start Condor daemons

  • service condor start

11. Submit one condor job to each WN to test if it works

  • cd /data6/qing/condor_test
  • source submit.sh

Refer : http://www.cs.wisc.edu/condor/yum/#relocatable

DIRECTORY PATHS

Command 'condor_config_val' can be used to obtain configured values.
Use 'condor_config_val -v variable ' to get the paths of the important directories

Directories Variable name
logs LOG
binaries RELEASE_DIR
local directory LOCAL_DIR
local configuration file LOCAL_CONFIG_FILE
lock LOCK

COMMANDS TO MANAGE JOBS

Refer : http://www.cs.wisc.edu/condor/manual/v7.4/9_Command_Reference.html

condor_q : To get the status of all queued jobs.
options :-
'-better-analyze' or '-analyze' to diagnose problems with jobs.
'-submitter ' to get condor jobs corresponding to a user.
Eg : condor_q -submitter "name" -better-analyze

condor_status: To monitor and query the condor pool for resource information, submitter information,
checkpoint server information, and daemon master information

condor_rm : Removes jobs from the pool
Eg : condor_rm user - removes jobs submitted by the user
Eg : condor_rm cluster.process : removes the specific job

condor_prio : To change priority of a user's job.The priority can be changed only by job owner or root.

condor_userprio : To change a user's priority.The priority can be changed only by root.

COMMANDS TO MANAGE CONDOR DAEMONS

The DAEMON_LIST config variable lists all the condor daemons to be started on a machine.
The condor_master starts all the other daemons mentioned in the DAEMON_LIST configuration variable.
Condor daemons can be started using following commands. Only true for valtical07,08,09 condor_master / condor_master_off
service condor start/service condor stop For the other machines use kill condor_master ;; rc.d/rc.local

Condor daemons can be started using condor_on command.This command works only if the Master daemon is running
If a daemon other than the condor_master is specified with the -subsystem option, condor_on starts up only that daemon
Eg: condor_on -subsystem master : Starts the daemons listed in the DAEMON_LIST configuration variable
Eg: condor_on -subsystem schedd : Starts Schedd daemon

Condor daemons can be shut down using condor_off command
Eg: condor_off : Shuts down all daemons except master.
Eg:condor_off -subsystem master : Shuts down all daemons including the condor_master.
Specification using the -subsystem option will shut down only the specified daemon.

CURRENT PRE-EMPTION POLICY ( Claim preemption is enabled)

PREEMPTION_REQUIREMENT : True
When considering user priorities, the negotiator will not preempt a job running on a given machine unless the PREEMPTION_REQUIREMENTS
expression evaluates to True and the owner of the idle job has a better priority than the owner of the running job

CLAIM_WORKLIFE: 1200
If provided, this expression specifies the number of seconds after which a claim will stop accepting additional jobs

NEGOTIATOR_CONSIDER_PREEMPTION: True

MANAGING LOG FILES

Log Files: {SUBSYS}_LOG - The name of the log file for a given subsystem(Collector,Negotiator,Schedd,Startd,Starter,Shadow,Master),located in LOG directory
Log rotation happens when log size exceeds 1M.Condor stores only a single rotated file for each subsystem.Therefore space required for logs corresponding
to a subsystem is 2M.

History Files: Located in SPOOL Directory.
History Log rotation happens when log size exceeds 2M.2 rotated files are stored in the SPOOL dir making the total space required for history logs 6M.

TROUBLESHOOTING

condor_q -analyze : The -better-analyze option may be used to determine why certain jobs are not running by performing an analysis on
a per machine basis for each machine in the pool.
condor_q -better-analyze : This option can be used for a thorough analysis.
Further, Logs mentioned above can be checked to figure out the problem.

PROOF

Master :Valtical
Slaves :Valtical04,Valtical05,Valtical06,Valtical07,Valtical08,Valtical09
Valtical00 is not part of the proof nodes on purpose, to avoid overloading the machine, that is the main NFS and XROOTD server.

PROOF CONFIGURATION

Configuration file locations : /opt/root/etc/xrootd.cfg, /opt/root/etc/proof/proof.conf

HOW TO RESTART PROOFD

If you want to restart PROOFD on all the nodes, you can use the cluster_do.py script written by L. Fiorini:

/afs/cern.ch/user/l/lfiorini/public/cluster_do.py -n valtical,valtical04,valtical05,valtical06,valtical07,valtical08,valtical09 -c "service proofd restart"
Executing Command: service proofd restart
on node valtical
on node valtical04
on node valtical05
on node valtical06
on node valtical07
on node valtical08
on node valtical09
as user: root
>>>>> Are you sure? [Yy/Nn] <<<<<
y


>>> exec on node valtical <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
>>> exec on node valtical04 <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
>>> exec on node valtical05 <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
>>> exec on node valtical06 <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
>>> exec on node valtical07 <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
>>> exec on node valtical08 <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
>>> exec on node valtical09 <<<
Stopping xproofd: [  OK  ]
Starting xproofd: [  OK  ]
End

PROOF Installation

1.Location independent installation of ROOT from source

  • Build ROOT cd root
    ./configure --help
    ./configure [] [set arch appropriately if no proper default]
    (g)make [or, make -j n for n core machines]

  • Add bin/ to PATH and lib/ to LD_LIBRARY_PATH. For the sh shell family do:
    . bin/thisroot.sh

  • Try running ROOT:
    root

Refer: http://root.cern.ch/drupal/content/installing-root-source

2.Create PROOF configuration files to /opt/root/etc/xrootd.cfg and /opt/root/etc/proof/proof.conf

3.Create PROOF service start/stop scripts and copy to /etc/init.d/proofd

4.Start PROOF by issuing service proofd start/stop

Further inputs on creating scripts and configuration file refer : http://root.cern.ch/drupal/content/standard-proof-installation-cluster-machines

LOG FILES

The log entries are made into xrootd log file at /opt/root/var/logs/xrootd.log

Starting/Stopping PROOF

PROOF can be started/stopped by issuing 'service proofd start' / 'service proofd stop'

check the temperature of cpu cores

  • yum -y install lm_sensors
  • sensors-detect
  • sensors

Large scale file deletion in xrootd system

  • put the list of directories to be deleted in /data6/qing/file_deletion
  • python scan.py
  • python delete.py
  • source delete.sh

How to download datasets from grid to CERN

  • cd /afs/cern.ch/user/q/qing/grid2CERN_lcg/data
  • put the name of datasets in list_data.txt
  • python lcg_data.py to create all.txt
  • python split.py and then run cp1.sh, cp2.sh,cp3.sh,cp4.sh,cp5.sh on 5 different valtical machines
  • source /work/users/qing/data5/qing/scan_cluster/scan.sh to create a new list of xrootd files
  • python lcg_check.py to create download_missing.sh
  • source download_missing.sh on one or more machines.

--+++ How to check if downloaded files are broken

  • cd /data6/qing/broken
  • source setup.sh (Juan's working environment to test his ntuples)
  • cat /work/users/qing/data5/qing/ForUsers/all_xrootd_files.txt | grep 'root://valtical.cern.ch//localdisk/xrootd/users/qing/data12_8TeV/SMDILEP_p1328_p1329/user.qing.data12_8TeV.periodH.physics_Muons.PhysCont.NTUP_SMWZ.grp14_v01_p1328_p1329_2LepSkim_v2' > MuonH?.txt
  • python create.py MuonH?.txt > 1.sh
  • source 1.sh
  • python find_bad.py

How to transfer datasets from CERN to IFIC

  • cd /afs/cern.ch/user/q/qing/CERN2IFIC/
  • put the xrootd paths into list_data.txt
  • python makelist.py to create list_files.txt
  • python transfer_list.py to create files_to_transfer.txt
  • source dq2_setup.sh to set up the enviroment
  • source files_to_transfer.txt to start the transfer
  • scp list_files.txt qing@ticalui01.uv.es:./private/data_transfer/list_files.txt
  • open a new terminal and log ticalui01, cd ~qing/private/data_transfer
  • python checklist.py
  • scp qing@ticalui01.uv.es:./private/data_transfer/lustre_file.txt ./
  • python transfer_missing.py
  • source files_to_transfer.txt

How to setup the free version of NX server which supports multi-users and multi-logins:

Database setup at valtical

  • yum -y install mysql-server mysql php-mysql, MySQL?-python;
  • /sbin/service mysqld start
  • /sbin/chkconfig mysqld on
  • mysql; create database xrootd; use xrootd; create corresponding table;
  • grant select,insert on xrootd.* to 'xrootd'@'localhost';
  • grant all on . to xrootd@'137.138.40.184';
  • grant all on . to xrootd@'137.138.40.143';
  • grant all on . to xrootd@'137.138.40.190';
  • grant all on . to xrootd@'137.138.40.186';
  • grant all on . to xrootd@'137.138.40.165';
  • grant all on . to xrootd@'137.138.40.181';
  • grant all on . to xrootd@'137.138.40.166';
  • grant all on . to xrootd@'137.138.40.140';
  • grant all on . to xrootd@'137.138.40.173';

Pyroot setup at valtical00:

  • install pyroot
    • cd /work/users/qing/software
    • mkdir root5.28
    • wget ftp://root.cern.ch/root/root_v5.28.00b.source.tar.gz
    • tar -xvzf root_v5.28.00b.source.tar.gz
    • cd root
    • ./configure --with-python-incdir=/work/users/qing/software/python2.4/include/python2.4 --with-python-libdir=/work/users/qing/software/python2.4/lib --prefix="/work/users/qing/software/root/root5.28" --etcdir="/work/users/qing/software/root/root5.28/etc"
    • gmake
    • gmake install

  • Environment setup before using ROOT:
    • export ROOTSYS=/work/users/qing/software/root
    • export PATH=/work/users/qing/software/python2.4/bin:$ROOTSYS/bin:$PATH
    • export LD_LIBRARY_PATH=$ROOTSYS/lib:/work/users/qing/software/python2.4/lib:$LD_LIBRARY_PATH
    • export PYTHONPATH=$PYTHONPATH:$ROOTSYS/lib

xrootd installation on slc6:

proof installation on slc6:

  • install ROOT under /opt/, combining with xrootd and python
    • yum install xrootd-private-devel (needed for root_v5.34.07, newest xrootd-devel (version 3.3) RPMs for Fedora/RedHat/SL do not include that header file anymore, so to install the head files we need to install xrootd-private-devel)
    • ./configure --with-xrootd-incdir=/usr/include/xrootd/,/usr/include/xrootd/private/ --with-xrootd-libdir=/usr/lib64/ --with-python-incdir/usr/include/python2.6/ --with-python-libdir /usr/lib64
    • create /etc/init.d/proofd
    • configure /opt/root/etc/xproofd.cfg and /opt/root/etc/proof/proof.conf using template from valtical06.

condor installation on slc6:

condor debug:

    • In condor_config, add 'TOOL_DEBUG = D_ALL,SUBMIT_DEBUG = D_ALL,SCHEDD_DEBUG = D_ALL'

install 32 bit libraries on slc6:

    • yum install libaio.so.1, libcrypto.so.6

install display on valticalui01

install missing lib for grid env:

    • yum -y install openssl098e
    • yum -y install compat-expat1-1.95.8-8.el6.i686
    • yum -y install expat-devel.x86_64
    • yum -y install expat-devel.i686
    • ln -s /lib64/libexpat.so.1.5.2 /usr/lib64/libexpat.so.0
    • yum install compat-openldap-2.3.43-2.el6.x86_64

Current TODO List

  • Have stable xrootd and xrootdfs services
  • Propagate condor installation from valtical07,08,09 to valtical00, valtical04,valtical05
  • Get reliable data storage information in xrootd
  • Finish installation of the new 1.8 TB disks
  • De/Recommission valtical15

Maintenance tasks

On request

  • Install new packages in all computers
  • Add new users
  • Change user default settings
  • Remove old users
  • Check nfs, xrootd, condor status

Daily

  • Check and kill for zombie processes.
  • Check CPU and memory consumption.
  • Free cached memory.
  • Check SAM performance and Ganglia status

Weekly

  • Check for package upgrades (No condor updates for the moment)
  • Check disk space status
  • Warn users using considerable amount of disk space
  • Help users migrate data from NFS to xrootd
  • Check /var/log/messages for SMART message of disk problem

Monthly

  • Reboot machines in the cluster

-- Main.avalero - 13 Sep 2010

Edit | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r69 < r68 < r67 < r66 < r65 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback