CERN Computer Cluster
The Valencia Computer Cluster
The
TileCal Valencia computer cluster at CERN is located in building 175.
It is directly accessible from within CERN's General Public Network.
Remote access has to be done through the lxplus service.
local > ssh -X login@lxplus.cern.ch
lxplus > ssh -X login@valticalXX
Link to IFIC T3 computing cluster:
https://twiki.ific.uv.es/twiki/bin/view/Atlas/IFICT3Cluster
Link to user Questioning Area:
https://twiki.ific.uv.es/twiki/bin/view/Atlas/UserQuestions
Computers
Computer | Activity | Cores | Mem | Local Disk | OS |
Valtical01 | TDAQ, analysis | 4 | 6 GB | 160 GB | SLC6 |
Valtical03 | TDAQ, analysis | 2 | 4 GB | 160GB | SLC6 |
Valtical | Xrootd pool director , proof master, MYSQL server | 4 | 6 GB | 300 GB | SLC6 |
Valtical00 | Xrootd Disk server, condor slave, proof WN | 16 | 24 GB | 12.6 TB | SLC6 |
Valtical04 | Xrootd Disk server,condor slave; proof WN | 16 | 24 GB | 6 TB | SLC6 |
Valtical05 | UI, NX Server, Xrootd Disk server,proof_lite, proof WN, proof submit machine,condor master, condor submit machine ,MYSQL client | 24 | 48 GB | 16 TB | SLC5 |
Valtical06 | Xrootd Disk server,condor slave; proof WN | 16 | 24 GB | 2 TB | SLC6 |
Valtical07 | condor slave; ganglia server, , NFS server for /work | 16 | 24 GB | 4 TB | SLC6 |
Valtical08 | Xrootd Disk server,condor slave; proof WN | 16 | 24 GB | 4 TB | SLC6 |
Valtical09 | Xrootd Disk server,condor slave; proof WN | 16 | 24 GB | 10 TB | SLC6 |
valticalui01 | User Interface, NFS server for /data6, MYSQL client | 16 | 24GB | 2TB | SLC6 |
Valtical15 | NFS server for /data2 & /data3 | 8 | 6 GB | 1.5 TB | SLC5 |
Cluster topology
- All computers mount /data6 from valticalui01 as data storage for analysis
- All analysis computers mount /localdisk locally.
- All analysis computers use /localdisk/xrootd as xrootd file system cache.
- All computers mount /work from valtical07. To be used for collaborative code development (No data).
- Offline developments are located in /work/offline.
- Online developments are located in /work/TicalOnline.
Ganglia Cluster Monitor
One can monitor the cluster load from this page:
Ganglia Monitor Link
IMPORTANT: How to delete files on xrootd
To delete files in xrootd,
NEVER use
rm
on
xrootdfs
Use instead this script
/afs/cern.ch/user/l/lfiorini/public/xrdrm.sh $filename
$filename
can be provided in the following formats:
root://valtical.cern.ch//localdisk/...
or
/xrootdfs/localdisk/xrootd/...
You can create a file list with
find //xrootdfs/localdisk/xrootd/directory > /tp/list.txt
and then from the command line:
for f in `cat /tmp/list.txt`; do /afs/cern.ch/user/l/lfiorini/public/xrdrm.sh $f; done
How to use xls to query the xrootd storage:
- '-s' : This will show the size of a directory
- xls -s root://valtical.cern.ch//localdisk/xrootd/users/
- '-l': This will show the directories and files in the given directory with their size
- xls -l root://valtical.cern.ch//localdisk/xrootd/users/qing/
- '-a': This will show all files in a directory and it's sub directories:
- xls -a root://valtical.cern.ch//localdisk/xrootd/test
How to setup root in SLC5 with gcc43
32-bit
Source the script:
/work/offline/common/setup32.sh
#!/bin/bash
export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/i686-slc5-gcc43-opt/root
export PATH=/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/bin:$ROOTSYS/bin:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/bin:$PATH
export LD_LIBRARY_PATH=$ROOTSYS/lib:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/mpfr/2.3.1/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/gmp/4.2.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/lib
export PYTHONPATH=$ROOTSYS/lib
64-bit
Source the script:
/work/offline/common/setup.sh
#!/bin/bash
export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43/root
export PATH=/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43/bin:$ROOTSYS/bin:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/bin:$PATH
export LD_LIBRARY_PATH=$ROOTSYS/lib:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/mpfr/2.3.1/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/gmp/4.2.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43/lib
export PYTHONPATH=$ROOTSYS/lib
How to install and setup NX Client
Installation of NX Client
1.Download the three packages as root in /usr from here:
http://64.34.161.181/download/3.4.0/Linux/nxclient-3.4.0-7.i386.tar.gz
http://64.34.161.181/download/3.4.0/Linux/nxnode-3.4.0-16.i386.tar.gz
http://64.34.161.181/download/3.4.0/Linux/FE/nxserver-3.4.0-17.i386.tar.gz
2.Install the packages:
- cd /usr
- sudo tar zxvf nxclient-3.4.0-7.i386.tar.gz
- sudo tar zxvf nxnode-3.4.0-16.i386.tar.gz
- sudo tar zxvf nxserver-3.4.0-17.i386.tar.gz
3.Run the setup script for installing the NX Node and NX Server software:
- sudo /usr/NX/scripts/setup/nxnode --install
How to setup and execute the NX Client
1. Execute the tunneling script that will create a tunnel to one machine
in the lab through lxplus.
Currently only valtical00 and valtical09 have the NX server running.
Execute the python script :
python tunneling.sh
#!/usr/bin/env python
import os
import time
import pexpect
import sys
import getpass
user = raw_input("User:")
passw = getpass.unix_getpass("Enter your password:")
if (user,passw) != ("",""):
print "parent thread"
print "Connecting to lxplus"
ssh = pexpect.spawn('ssh -L 10001:valtical09.cern.ch:22 %s@lxplus.cern.ch'%user)
ssh.expect('password')
ssh.sendline(passw)
ssh.expect('lxplus')
ssh.interact()
Note: You need pexpect package installed in your computer.
2. Open the nxclient
In another terminal:
3. Configure the session. This should be done only once.
Configuration:
* Host: localhost
* Port: 10001
Write your username and password and Login. A window will appear and you will see the remote desktop.
Initialization of the NX Client for MacOS? Lion
- Clear your lxplus:~/.nx folder
- Clear your localhost:~/.ssh/known_hosts file
- Login interactively to lxplus and to the valtical node from there ( you will be prompted for confirmation )
- Create tunnel
- Open NX client:
- session name: valtical
- host: localhost
- port: same as in tunnel script
- type: NoMachine?
NAS Access
Instructions to access the
ticalnas01.ific.uv.es
server:
- Ask Luca to create an account for you.
- If you want to access the NAS by NFS, it is preferable to provide the UID of the user you want to access
ticalnas01
from.
- Now you can access the data from SMB and NFS.
- If you want to mount the disk via SMB or NFS on Linux, do the following:
- SMB:
- Install samba, smbclient and cifs-utils packages
- Access interactively with:
smbclient //ticalnas01.ific.uv.es/Documents -U YOUR_USERNAME
- To mount, add the following lines to the
/etc/fstab
:
-
//ticalnas01.ific.uv.es:/Documents YOUR_MOUNT_POINT cifs noauto,users,rw,noexec,username=YOUR_USERNAME 0 0
- NFS:
- Install nfs-common package.
- Add the following line to the
/etc/fstab
-
ticalnas01.ific.uv.es:/ YOUR_MOUNT_POINT nfs noauto,users,rw,rsize=8192,wsize=8192,hard,intr 0 0
FAQ
Q. How to change shell to bash?
A. Change your default shell in
http://cern.ch/cernaccount under Applications and Resources > Linux and AFS > Unix shell.
XROOTD FAQ
Q. How is xrootd configured in the valtical cluster?
A. Xrootd is locally installed in the cluster under /opt/vdt-xrootd/xrootd.
Valtical acts as the manager/redirector and valtical00,valtical04,valtical05,valtical06,valtical07,valtical08,valtical09 act as data servers.
Q. What is the location of xrootd config files on the valtical machines?
A. The xrootd config file is the same for all nodes. Located in /opt/vdt-xrootd/xrootd/etc/xrootd.cfg
Q. What are the xrootd processes running on the manager?
A. The xrootd , Cluster Management Service daemon (cmsd) , xrootd - Composite
NameSpace? daemon (xrootd-cns) , xrootdfs daemon (xrootdfsd)
Q. What are the xrootd processes running on each node?
A. The manager runs 3 instances of xrootd: one for xrootd itself, another one for cmsd and another one for xrootd-cns.
The data servers run only 2 instances of xrootd for xrootd itself and cmsd only. All instances are controlled by
service xrootd start/stop
service cmsd start/stop
Q. How is xrootdfs process started/stopped on the manager (valtical)?
A. Xrootdfs is installed in /opt/vdt-xrootdfs/xrootdfs. The start stop scripts are in etc/bin. The service is controlled via:
service xrootdfs start/stop
Q. Where are the xrootd log files created?
A. The location of xrootd logs is /opt/vdt-xrootd/xrootd/var/logs.
Q: Existing files suddenly disappeared or are not accessible anymore from xrootd.
A: See Troubleshoting session below.
How to list files under XROOTD
You can find the current list of files here:
/data6/lfiorini/lsxrootd/contents/MergedList.txt
This list is quite handy also for your analysis, as the list is already sorted.
The scripts producing the list are run automatically by a cronjob and it is updated
every hour.
XROOTD Operations
XrootdFS? provide filesystem interface to the underline storage system upon which the user space program operates.
Path to
XrootdFS? directory is /xrootdfs/localdisk/xrootd.
Some of the useful functionalities:-
Read files / list directories ( Eg: cat 'filename' , ls 'dirname')
create files / create directories ( Eg: mkdir 'dirname')
remove files / remove directories ( Eg rm 'filename' , rm -rf 'dirname')
XROOTD Installation
1. Log in as root
2. Create the directory where xrootd is to be installed
- export INSTALL_DIR=/opt/vdt-xrootd
- mkdir $INSTALL_DIR
- cd $INSTALL_DIR
3. Install Pacman software
4. Install xrootd package from VDT cache and run the post installation scripts.
5. Configure xrootd redirector
- source setup.sh
- $VDT_LOCATION/vdt/setup/configure_xrootd --server y --this-is-xrdr --storage-path /localdisk/xrootd
--storage-cache /localdisk/xrootd --enable-security --set-data-server-xrd-port 1094 --user xrootd
- "ERROR: hostname returns 'valtical', which does not match your fully qualified hostname 'valtical.cern.ch' The Xrootd redirector will not work unless hostname returns a FQHN." could be fixed by:
* echo "137.138.40.146 valtical.cern.ch valtical" >> /etc/hosts
* modify the value of 'HOSTNAME' in/etc/sysconfig/network:
* echo "valtical.cern.ch" > /proc/sys/kernel/hostname
- edit the xrootd.cfg file located at /opt/vdt-xrootd/xrootd/etc/xrootd.cfg
6. Configure xrootd data server
- source setup.sh
- $VDT_LOCATION/vdt/setup/configure_xrootd --server y --xrdr-host valtical.cern.ch --storage-path /localdisk/xrootd
--storage-cache /localdisk/xrootd --enable-security --set-data-server-xrd-port 1094 --user xrootd
- edit the xrootd.cfg file located at /opt/vdt-xrootd/xrootd/etc/xrootd.cfg
7. Start the xrootd service
- vdt-control --non-root --on
EOS client installation
1. yum -y install xrootd-libs zlib readline ncurses openssl e2fsprogs-libs
2. rpm -ivh /afs/cern.ch/project/eos/rpms/slc-5-x86_64/eos-client-0.1.0-rc37.x86_64.rpm
3. yum -y install xrootd-client fuse-libs
4. rpm -ivh /afs/cern.ch/project/eos/rpms/slc-5-x86_64/eos-fuse-0.1.0-rc37.x86_64.rpm
5. export EOS_MGM_URL=root://eosatlas.cern.ch
XROOTDFS Installation
The system should have FUSE installed.FUSE can be installed using 'yum install fuse fuse-libs'.
1.Setup Pacman, make an installation directory and assign its path to $INSTALL_DIR. Install
XrootdFS? from the
http://software.grid.iu.edu/osg-1.2 cache.
2.Configure
XrootdFS?
- cd $INSTALL_DIR
- source setup.sh
- $VDT_LOCATION/vdt/setup/configure_xrootdfs --user xrootd --cache /xrootdfs/localdisk/xrootd --xrdr-host valtical.cern.ch --xrdr-storage-path /localdisk/xrootd
XROOTD Troubleshooting
Q. Error : Access to cluster files and directories via /xrootdfs/localdisk/xrootd fails
Eg : Command ls /xrootdfs/localdisk/xrootd gives error:Transport endpoint is not connected.
A. Check whether the partition /xrootdfs/localdisk/xrootd is mounted properly.If not,
Try stopping and starting xrootdfsd daemon as explained in the previous section.
If stopping xrootdfsd does not succeed,
Stop the xrootd processes
Unmount the partition using 'umount -l /xrootdfs/localdisk/xrootd'
Start xrootd processes
Start the xrootdfsd daemon
Q: Existing Files suddenly disappear from xroot
A: Most likely one of xrootd machine hosting the files is down or is hanging or the xrootd service is not running anymore on that machine.
This can also happen in conjunction with the error message, when trying to list the directory with
xrd valtical dirlist SOMEDIR
:
Error 3011: Unable to open directory SOMEDIR; no such file or directory
In server valtical.cern.ch:1094 or in some of its child nodes.
In this case: identify the machine which is affected by the problem. Reboot the machine if needed, restart the
cmsd and
xrootd services if needed. Beside xrootd, cmsd is also needed, otherwise the files will not be accessible:
service cmsd restart
service xrootd restart
Q. Error in listing contents of the directories corresponding to xrootdfs in valtical.cern.ch
A. This error happens when xrootd - cns daemon is no longer running on valtical.cern.ch. Check whether the xrootd - cns process is running on valtical.cern.ch.
Q. Error : Unable to write to xrootd cluster. Error message :Last server error 3011 ('No servers are available to write the file.')
A. Check whether there is enough disk space available in the xrootd cluster.
Q. Error : Xrootd runs on the redirector and data servers.But there is no communication between the redirector and data server.
A. Add rules to iptables to accept incoming tcp connections from xrootd.
Q. Error: Connection timeout
A. Make sure xrootdfs is correctly configured in /opt/vdt-xrootdfs/xrootdfs/bin/start.sh
CONDOR
Central Manager : Valtical00
Deamons running : Master,Negotiator,Collector,Procd
Submit Machine : Valtical00
Deamons running : Master,Schedd,Procd
Execute Machine :Valtical04 - Valtical09
Deamons running : Master,Startd,Procd
CONDOR CONFIGURATION
Global configuration file : /opt/condor-7.5.0.bak/etc/condor/condor_config
Local configuration file : //opt/condor-7.5.0.bak/etc/condor/condor_config.local
The configuration values can be obtained by querying using 'condor_config_val -v variable' command or by referring the config files
The config values set in the global config file is overridden by the values in the local config file.
The condor_reconfig command can be used for configuration changes to take effect after configuration file changes.
The changes to some configuration parameters(eg : DAEMON_LIST) take effect only on restarting the master daemon using command : condor_restart
valtical00 is set to 1 MAX_NUM_CPUS as it is the master of the batch system
valticalXX are set to 12 MAX_NUM_CPUS in order to release the load during peak periods.
CONDOR INSTALLATION and TESTING
In such way for valtical00,04,05,06,07,08,09
Note: "root" user must be used to perform installation
1. stop all the running condor daemons
* service condor stop
* pkill -f condor
2. find and remove the old condor rpm package
* rpm -qa | grep condor
* rpm -e old_condor_version
3. make a back up of previous package, in this way the *.rpmsave file will also be invalidated.
* mv /opt/condor-7.5.0 /opt/condor-7.5.0.bak
4.Download and install the YUM repository file that matches your operating systems
Installing latest condor for Redhat 5
5.Install Yum's downloadonly module in order to download RPM from repository
- yum install yum-downloadonly
6.Download Condor RPM from yum repository to temporary folder
Installation for 64 bit machine
- yum install condor.x86_64 --downloadonly --downloaddir=/tmp
7.Install RPM into /opt/condor-7.5.0 folder
- rpm -ivh /tmp/condor-7.5.1-1.rhel5.i386.rpm \
--relocate /usr=/opt/condor-7.5.0/usr \
--relocate /var=/opt/condor-7.5.0/var \
--relocate /etc=/opt/condor-7.5.0/etc
- if the previous command failed due to missing rpm sources, install them
* yum install libvirt
* yum install perl-XML-Simple-2.14-4.fc6.noarch
8.Edit Condor's configuration files condor_config.local,condor_config located
and paths in condor start script.
* cd /opt/condor-7.5.0/etc/condor/
* cp /opt/condor-7.5.0.bak/etc/condor/condor_config ./
* cp /opt/condor-7.5.0.bak/etc/condor/condor_config.local ./
9. Add the following 2 lines to /etc/profile.d/condor.sh if necessary and then source the .sh file
* export PATH=${PATH}:/opt/condor-7.5.0/usr/bin:/opt/condor-7.5.0/usr/sbin:/sbin
* export CONDOR_CONFIG=/opt/condor-7.5.0/etc/condor/condor_config
10 .Start Condor daemons
11. Submit one condor job to each WN to test if it works
- cd /data6/qing/condor_test
- source submit.sh
Refer :
http://www.cs.wisc.edu/condor/yum/#relocatable
DIRECTORY PATHS
Command 'condor_config_val' can be used to obtain configured values.
Use 'condor_config_val -v variable ' to get the paths of the important directories
Directories | Variable name |
logs | LOG |
binaries | RELEASE_DIR |
local directory | LOCAL_DIR |
local configuration file | LOCAL_CONFIG_FILE |
lock | LOCK |
COMMANDS TO MANAGE JOBS
Refer :
http://www.cs.wisc.edu/condor/manual/v7.4/9_Command_Reference.html
condor_q : To get the status of all queued jobs.
options :-
'-better-analyze' or '-analyze' to diagnose problems with jobs.
'-submitter ' to get condor jobs corresponding to a user.
Eg : condor_q -submitter "name" -better-analyze
condor_status: To monitor and query the condor pool for resource information, submitter information,
checkpoint server information, and daemon master information
condor_rm : Removes jobs from the pool
Eg : condor_rm user - removes jobs submitted by the user
Eg : condor_rm cluster.process : removes the specific job
condor_prio : To change priority of a user's job.The priority can be changed only by job owner or root.
condor_userprio : To change a user's priority.The priority can be changed only by root.
COMMANDS TO MANAGE CONDOR DAEMONS
The DAEMON_LIST config variable lists all the condor daemons to be started on a machine.
The condor_master starts all the other daemons mentioned in the DAEMON_LIST configuration variable.
Condor daemons can be started using following commands.
Only true for valtical07,08,09
condor_master / condor_master_off
service condor start/service condor stop
For the other machines use
kill condor_master ;; rc.d/rc.local
Condor daemons can be started using condor_on command.This command works only if the Master daemon is running
If a daemon other than the condor_master is specified with the -subsystem option, condor_on starts up only that daemon
Eg: condor_on -subsystem master : Starts the daemons listed in the DAEMON_LIST configuration variable
Eg: condor_on -subsystem schedd : Starts Schedd daemon
Condor daemons can be shut down using condor_off command
Eg: condor_off : Shuts down all daemons except master.
Eg:condor_off -subsystem master : Shuts down all daemons including the condor_master.
Specification using the -subsystem option will shut down only the specified daemon.
CURRENT PRE-EMPTION POLICY ( Claim preemption is enabled)
PREEMPTION_REQUIREMENT : True
When considering user priorities, the negotiator will not preempt a job running on a given machine unless the PREEMPTION_REQUIREMENTS
expression evaluates to True and the owner of the idle job has a better priority than the owner of the running job
CLAIM_WORKLIFE: 1200
If provided, this expression specifies the number of seconds after which a claim will stop accepting additional jobs
NEGOTIATOR_CONSIDER_PREEMPTION: True
MANAGING LOG FILES
Log Files:
{SUBSYS}_LOG - The name of the log file for a given subsystem(Collector,Negotiator,Schedd,Startd,Starter,Shadow,Master),located in LOG directory
Log rotation happens when log size exceeds 1M.Condor stores only a single rotated file for each subsystem.Therefore space required for logs corresponding
to a subsystem is 2M.
History Files: Located in SPOOL Directory.
History Log rotation happens when log size exceeds 2M.2 rotated files are stored in the SPOOL dir making the total space required for history logs 6M.
TROUBLESHOOTING
condor_q -analyze : The -better-analyze option may be used to determine why certain jobs are not running by performing an analysis on
a per machine basis for each machine in the pool.
condor_q -better-analyze : This option can be used for a thorough analysis.
Further, Logs mentioned above can be checked to figure out the problem.
PROOF
Master :Valtical
Slaves :Valtical04,Valtical05,Valtical06,Valtical07,Valtical08,Valtical09
Valtical00 is not part of the proof nodes on purpose, to avoid overloading the machine, that is the main NFS and XROOTD server.
PROOF CONFIGURATION
Configuration file locations : /opt/root/etc/xrootd.cfg, /opt/root/etc/proof/proof.conf
HOW TO RESTART PROOFD
If you want to restart PROOFD on all the nodes, you can use the cluster_do.py script written by L. Fiorini:
/afs/cern.ch/user/l/lfiorini/public/cluster_do.py -n valtical,valtical04,valtical05,valtical06,valtical07,valtical08,valtical09 -c "service proofd restart"
Executing Command: service proofd restart
on node valtical
on node valtical04
on node valtical05
on node valtical06
on node valtical07
on node valtical08
on node valtical09
as user: root
>>>>> Are you sure? [Yy/Nn] <<<<<
y
>>> exec on node valtical <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
>>> exec on node valtical04 <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
>>> exec on node valtical05 <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
>>> exec on node valtical06 <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
>>> exec on node valtical07 <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
>>> exec on node valtical08 <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
>>> exec on node valtical09 <<<
Stopping xproofd: [ OK ]
Starting xproofd: [ OK ]
End
PROOF Installation
1.Location independent installation of ROOT from source
- Build ROOT cd root
./configure --help
./configure [] [set arch appropriately if no proper default]
(g)make [or, make -j n for n core machines]
- Add bin/ to PATH and lib/ to LD_LIBRARY_PATH. For the sh shell family do:
. bin/thisroot.sh
Refer:
http://root.cern.ch/drupal/content/installing-root-source
2.Create PROOF configuration files to /opt/root/etc/xrootd.cfg and /opt/root/etc/proof/proof.conf
3.Create PROOF service start/stop scripts and copy to /etc/init.d/proofd
4.Start PROOF by issuing service proofd start/stop
Further inputs on creating scripts and configuration file refer :
http://root.cern.ch/drupal/content/standard-proof-installation-cluster-machines
LOG FILES
The log entries are made into xrootd log file at /opt/root/var/logs/xrootd.log
Starting/Stopping PROOF
PROOF can be started/stopped by issuing 'service proofd start' / 'service proofd stop'
check the temperature of cpu cores
- yum -y install lm_sensors
- sensors-detect
- sensors
Large scale file deletion in xrootd system
- put the list of directories to be deleted in /data6/qing/file_deletion
- python scan.py
- python delete.py
- source delete.sh
How to download datasets from grid to CERN
- cd /afs/cern.ch/user/q/qing/grid2CERN_lcg/data
- put the name of datasets in list_data.txt
- python lcg_data.py to create all.txt
- python split.py and then run cp1.sh, cp2.sh,cp3.sh,cp4.sh,cp5.sh on 5 different valtical machines
- source /work/users/qing/data5/qing/scan_cluster/scan.sh to create a new list of xrootd files
- python lcg_check.py to create download_missing.sh
- source download_missing.sh on one or more machines.
--+++ How to check if downloaded files are broken
- cd /data6/qing/broken
- source setup.sh (Juan's working environment to test his ntuples)
- cat /work/users/qing/data5/qing/ForUsers/all_xrootd_files.txt | grep 'root://valtical.cern.ch//localdisk/xrootd/users/qing/data12_8TeV/SMDILEP_p1328_p1329/user.qing.data12_8TeV.periodH.physics_Muons.PhysCont.NTUP_SMWZ.grp14_v01_p1328_p1329_2LepSkim_v2' > MuonH?.txt
- python create.py MuonH?.txt > 1.sh
- source 1.sh
- python find_bad.py
How to transfer datasets from CERN to IFIC
- cd /afs/cern.ch/user/q/qing/CERN2IFIC/
- put the xrootd paths into list_data.txt
- python makelist.py to create list_files.txt
- python transfer_list.py to create files_to_transfer.txt
- source dq2_setup.sh to set up the enviroment
- source files_to_transfer.txt to start the transfer
- scp list_files.txt qing@ticalui01.uv.es:./private/data_transfer/list_files.txt
- open a new terminal and log ticalui01, cd ~qing/private/data_transfer
- python checklist.py
- scp qing@ticalui01.uv.es:./private/data_transfer/lustre_file.txt ./
- python transfer_missing.py
- source files_to_transfer.txt
How to setup the free version of NX server which supports multi-users and multi-logins:
- NX server: NXserver_setup.sh
- Install NX client:
- Connection from outside of CERN network needs a tunnel:
- python tunnel.sh, log on lxplus with your CERN NICE account and password.
- In NX client configuration, set Host as localhost, and port as 10001, and then log in with your NICE account and passwd.
- Machines supports NX connection:
Database setup at valtical
- yum -y install mysql-server mysql php-mysql, MySQL?-python;
- /sbin/service mysqld start
- /sbin/chkconfig mysqld on
- mysql; create database xrootd; use xrootd; create corresponding table;
- grant select,insert on xrootd.* to 'xrootd'@'localhost';
- grant all on . to xrootd@'137.138.40.184';
- grant all on . to xrootd@'137.138.40.143';
- grant all on . to xrootd@'137.138.40.190';
- grant all on . to xrootd@'137.138.40.186';
- grant all on . to xrootd@'137.138.40.165';
- grant all on . to xrootd@'137.138.40.181';
- grant all on . to xrootd@'137.138.40.166';
- grant all on . to xrootd@'137.138.40.140';
- grant all on . to xrootd@'137.138.40.173';
Pyroot setup at valtical00:
- install pyroot
- cd /work/users/qing/software
- mkdir root5.28
- wget ftp://root.cern.ch/root/root_v5.28.00b.source.tar.gz
- tar -xvzf root_v5.28.00b.source.tar.gz
- cd root
- ./configure --with-python-incdir=/work/users/qing/software/python2.4/include/python2.4 --with-python-libdir=/work/users/qing/software/python2.4/lib --prefix="/work/users/qing/software/root/root5.28" --etcdir="/work/users/qing/software/root/root5.28/etc"
- gmake
- gmake install
- Environment setup before using ROOT:
- export ROOTSYS=/work/users/qing/software/root
- export PATH=/work/users/qing/software/python2.4/bin:$ROOTSYS/bin:$PATH
- export LD_LIBRARY_PATH=$ROOTSYS/lib:/work/users/qing/software/python2.4/lib:$LD_LIBRARY_PATH
- export PYTHONPATH=$PYTHONPATH:$ROOTSYS/lib
xrootd installation on slc6:
proof installation on slc6:
- install ROOT under /opt/, combining with xrootd and python
- yum install xrootd-private-devel (needed for root_v5.34.07, newest xrootd-devel (version 3.3) RPMs for Fedora/RedHat/SL do not include that header file anymore, so to install the head files we need to install xrootd-private-devel)
- ./configure --with-xrootd-incdir=/usr/include/xrootd/,/usr/include/xrootd/private/ --with-xrootd-libdir=/usr/lib64/ --with-python-incdir/usr/include/python2.6/ --with-python-libdir /usr/lib64
- create /etc/init.d/proofd
- configure /opt/root/etc/xproofd.cfg and /opt/root/etc/proof/proof.conf using template from valtical06.
condor installation on slc6:
condor debug:
-
- In condor_config, add 'TOOL_DEBUG = D_ALL,SUBMIT_DEBUG = D_ALL,SCHEDD_DEBUG = D_ALL'
install 32 bit libraries on slc6:
-
- yum install libaio.so.1, libcrypto.so.6
install display on valticalui01
install missing lib for grid env:
-
- yum -y install openssl098e
- yum -y install compat-expat1-1.95.8-8.el6.i686
- yum -y install expat-devel.x86_64
- yum -y install expat-devel.i686
- ln -s /lib64/libexpat.so.1.5.2 /usr/lib64/libexpat.so.0
- yum install compat-openldap-2.3.43-2.el6.x86_64
Current TODO List
- Have stable xrootd and xrootdfs services
- Propagate condor installation from valtical07,08,09 to valtical00, valtical04,valtical05
- Get reliable data storage information in xrootd
- Finish installation of the new 1.8 TB disks
- De/Recommission valtical15
-
Maintenance tasks
On request
- Install new packages in all computers
- Add new users
- Change user default settings
- Remove old users
- Check nfs, xrootd, condor status
Daily
- Check and kill for zombie processes.
- Check CPU and memory consumption.
- Free cached memory.
- Check SAM performance and Ganglia status
Weekly
- Check for package upgrades (No condor updates for the moment)
- Check disk space status
- Warn users using considerable amount of disk space
- Help users migrate data from NFS to xrootd
- Check /var/log/messages for SMART message of disk problem
Monthly
- Reboot machines in the cluster
-- Main.avalero - 13 Sep 2010