V
iew topic
|
Edit
|
WYSIWYG
Attach
P
DF
r67 - 07 Nov 2013 - 13:24:53 - Main.qing
You are here:
TWiki
>
Atlas Web
>
TileCal
>
CERNComputerCluster
---+!! CERN Computer Cluster <!-- Eliminar la linea posterior si no se desea Tabla de Contenidos automatica --> %TOC% ---++ The Valencia Computer Cluster The TileCal Valencia computer cluster at CERN is located in building 175. It is directly accessible from within CERN's General Public Network. Remote access has to be done through the lxplus service. <verbatim> local > ssh -X login@lxplus.cern.ch lxplus > ssh -X login@valticalXX </verbatim> ---+++ Link to IFIC T3 computing cluster: https://twiki.ific.uv.es/twiki/bin/view/Atlas/IFICT3Cluster ---+++ Link to user Questioning Area: https://twiki.ific.uv.es/twiki/bin/view/Atlas/UserQuestions ---+++ Computers | *Computer* | *Activity* | *Cores* | *Mem* | *Local Disk* | *OS* | |__Valtical01__ |TDAQ, analysis | 4 |6 GB |160 GB |SLC6| |__Valtical03__ |TDAQ, analysis | 2 |4 GB |160GB |SLC6| |__Valtical__ |Xrootd pool director , proof master, MYSQL server | 4 |6 GB |300 GB |SLC6| |__Valtical00__ |Xrootd Disk server, condor slave, proof WN | 16 |24 GB |12.6 TB |SLC6| |__Valtical04__ |Xrootd Disk server,condor slave; proof WN | 16 |24 GB |6 TB |SLC6| |__Valtical05__ |UI, NX Server, Xrootd Disk server,proof_lite, proof WN, proof submit machine,condor master, condor submit machine ,MYSQL client | 24 |48 GB |16 TB |SLC5| |__Valtical06__ |Xrootd Disk server,condor slave; proof WN | 16 |24 GB |2 TB |SLC6| |__Valtical07__ |condor slave; ganglia server, , NFS server for /work | 16 |24 GB |4 TB |SLC6| |__Valtical08__ |Xrootd Disk server,condor slave; proof WN | 16 |24 GB |4 TB |SLC6| |__Valtical09__ |Xrootd Disk server,condor slave; proof WN | 16 |24 GB |10 TB |SLC6| |__valticalui01__| User Interface, NFS server for /data6, MYSQL client | 16 |24GB |2TB | SLC6| |__Valtical15__ | NFS server for /data2 & /data3 | 8 |6 GB |1.5 TB | SLC5| ---+++ Cluster topology * All computers mount /data6 from valticalui01 as data storage for analysis * All analysis computers mount /localdisk locally. * All analysis computers use /localdisk/xrootd as xrootd file system cache. * All computers mount /work from valtical07. To be used for collaborative code development (No data). * Offline developments are located in /work/offline. * Online developments are located in /work/TicalOnline. ---+++ Ganglia Cluster Monitor One can monitor the cluster load from this page: %BR% [[http://valtical07.cern.ch/ganglia/?c=VALTICAL][Ganglia Monitor Link]] ---+++ IMPORTANT: How to delete files on xrootd To delete files in xrootd, *NEVER* use ==rm== on ==xrootdfs== %BR% Use instead this script ==/afs/cern.ch/user/l/lfiorini/public/xrdrm.sh $filename== %BR% ==$filename== can be provided in the following formats: =root://valtical.cern.ch//localdisk/...= or =/xrootdfs/localdisk/xrootd/...= %BR% You can create a file list with ==find //xrootdfs/localdisk/xrootd/directory > /tp/list.txt== %BR% and then from the command line: ==for f in `cat /tmp/list.txt`; do /afs/cern.ch/user/l/lfiorini/public/xrdrm.sh $f; done== ---+++ How to use xls to query the xrootd storage: * '-s' : This will show the size of a directory * xls -s root://valtical.cern.ch//localdisk/xrootd/users/ * '-l': This will show the directories and files in the given directory with their size * xls -l root://valtical.cern.ch//localdisk/xrootd/users/qing/ * '-a': This will show all files in a directory and it's sub directories: * xls -a root://valtical.cern.ch//localdisk/xrootd/test ---+++ How to setup root in SLC5 with gcc43 ---++++ 32-bit Source the script: =/work/offline/common/setup32.sh= <verbatim> #!/bin/bash export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/i686-slc5-gcc43-opt/root export PATH=/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/bin:$ROOTSYS/bin:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/bin:$PATH export LD_LIBRARY_PATH=$ROOTSYS/lib:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/mpfr/2.3.1/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/gmp/4.2.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/lib export PYTHONPATH=$ROOTSYS/lib </verbatim> ---++++ 64-bit Source the script: =/work/offline/common/setup.sh= <verbatim> #!/bin/bash export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.28.00b/x86_64-slc5-gcc43/root export PATH=/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43/bin:$ROOTSYS/bin:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/bin:$PATH export LD_LIBRARY_PATH=$ROOTSYS/lib:/afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/mpfr/2.3.1/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/gmp/4.2.2/x86_64-slc5/lib:/afs/cern.ch/sw/lcg/external/Python/2.6.5/x86_64-slc5-gcc43/lib export PYTHONPATH=$ROOTSYS/lib </verbatim> ---+++ How to install and setup NX Client ---++++ Installation of NX Client 1.Download the three packages as root in /usr from here: http://64.34.161.181/download/3.4.0/Linux/nxclient-3.4.0-7.i386.tar.gz http://64.34.161.181/download/3.4.0/Linux/nxnode-3.4.0-16.i386.tar.gz http://64.34.161.181/download/3.4.0/Linux/FE/nxserver-3.4.0-17.i386.tar.gz 2.Install the packages:<br/> * cd /usr<br/> * sudo tar zxvf nxclient-3.4.0-7.i386.tar.gz<br/> * sudo tar zxvf nxnode-3.4.0-16.i386.tar.gz<br/> * sudo tar zxvf nxserver-3.4.0-17.i386.tar.gz<br/> 3.Run the setup script for installing the NX Node and NX Server software: * sudo /usr/NX/scripts/setup/nxnode --install ---++++ How to setup and execute the NX Client 1. Execute the tunneling script that will create a tunnel to one machine in the lab through lxplus. Currently only valtical00 and valtical09 have the NX server running. Execute the python script : =python tunneling.sh= <verbatim> #!/usr/bin/env python import os import time import pexpect import sys import getpass user = raw_input("User:") passw = getpass.unix_getpass("Enter your password:") if (user,passw) != ("",""): print "parent thread" print "Connecting to lxplus" ssh = pexpect.spawn('ssh -L 10001:valtical09.cern.ch:22 %s@lxplus.cern.ch'%user) ssh.expect('password') ssh.sendline(passw) ssh.expect('lxplus') ssh.interact() </verbatim> Note: You need pexpect package installed in your computer. 2. Open the nxclient In another terminal:<br/> * ./usr/NX/bin/nxclient<br/> 3. Configure the session. This should be done only once. <br/> Configuration:<br/> * Host: localhost<br/> * Port: 10001<br/> Write your username and password and Login. A window will appear and you will see the remote desktop.<br/> ---++++ Initialization of the NX Client for MacOS Lion 1. Clear your lxplus:~/.nx folder 1. Clear your localhost:~/.ssh/known_hosts file 1. Login interactively to lxplus and to the valtical node from there ( you will be prompted for confirmation ) 1. Create tunnel 1. Open NX client: * session name: valtical * host: localhost * port: same as in tunnel script * type: NoMachine ---+++ FAQ Q. How to change shell to bash? A. Change your default shell in http://cern.ch/cernaccount under Applications and Resources > Linux and AFS > Unix shell. ---++++ XROOTD FAQ Q. How is xrootd configured in the valtical cluster? A. Xrootd is locally installed in the cluster under /opt/vdt-xrootd/xrootd. Valtical acts as the manager/redirector and valtical00,valtical04,valtical05,valtical06,valtical07,valtical08,valtical09 act as data servers. Q. What is the location of xrootd config files on the valtical machines? A. The xrootd config file is the same for all nodes. Located in /opt/vdt-xrootd/xrootd/etc/xrootd.cfg Q. What are the xrootd processes running on the manager? A. The xrootd , Cluster Management Service daemon (cmsd) , xrootd - Composite NameSpace daemon (xrootd-cns) , xrootdfs daemon (xrootdfsd) Q. What are the xrootd processes running on each node? A. The manager runs 3 instances of xrootd: one for xrootd itself, another one for cmsd and another one for xrootd-cns. The data servers run only 2 instances of xrootd for xrootd itself and cmsd only. All instances are controlled by <br/> service xrootd start/stop Q. How is xrootdfs process started/stopped on the manager (valtical)? A. Xrootdfs is installed in /opt/vdt-xrootdfs/xrootdfs. The start stop scripts are in etc/bin. The service is controlled via: service xrootdfs start/stop <br/> Q. Where are the xrootd log files created? A. The location of xrootd logs is /opt/vdt-xrootd/xrootd/var/logs. ---+++++ How to list files under XROOTD You can find the current list of files here: ==/data6/lfiorini/lsxrootd/contents/MergedList.txt== %BR% This list is quite handy also for your analysis, as the list is already sorted.%BR% The scripts producing the list are run automatically by a cronjob and it is updated *every hour*. ---+++++ XROOTD Operations XrootdFS provide filesystem interface to the underline storage system upon which the user space program operates. <br/> Path to XrootdFS directory is /xrootdfs/localdisk/xrootd. Some of the useful functionalities:- Read files / list directories ( Eg: cat 'filename' , ls 'dirname')<br/> create files / create directories ( Eg: mkdir 'dirname')<br/> remove files / remove directories ( Eg rm 'filename' , rm -rf 'dirname') ---+++++ XROOTD Installation 1. Log in as root 2. Create the directory where xrootd is to be installed * export INSTALL_DIR=/opt/vdt-xrootd <br/> * mkdir $INSTALL_DIR <br/> * cd $INSTALL_DIR <br/> 3. Install Pacman software <br/> * cd /opt <br/> * wget http://vdt.cs.wisc.edu/software/pacman/3.28/pacman-3.28.tar.gz <br/> * tar -xzvf pacman-3.28.tar.gz <br/> * ln -s pacman-3.28 pacman <br/> * cd pacman <br/> * source setup.sh <br/> 4. Install xrootd package from VDT cache and run the post installation scripts.<br/> * cd $INSTALL_DIR<br/> * pacman -get http://vdt.cs.wisc.edu/vdt_200_cache:Xrootd <br/> * source setup.sh<br/> * vdt-post-install<br/> 5. Configure xrootd redirector <br/> * source setup.sh<br/> * $VDT_LOCATION/vdt/setup/configure_xrootd --server y --this-is-xrdr --storage-path /localdisk/xrootd <br/> --storage-cache /localdisk/xrootd --enable-security --set-data-server-xrd-port 1094 --user xrootd <br/> * "ERROR: hostname returns 'valtical', which does not match your fully qualified hostname 'valtical.cern.ch' The Xrootd redirector will not work unless hostname returns a FQHN." could be fixed by:<br/> * echo "137.138.40.146 valtical.cern.ch valtical" >> /etc/hosts<br/> * modify the value of 'HOSTNAME' in/etc/sysconfig/network:<br/> * echo "valtical.cern.ch" > /proc/sys/kernel/hostname<br/> * edit the xrootd.cfg file located at /opt/vdt-xrootd/xrootd/etc/xrootd.cfg 6. Configure xrootd data server<br/> * source setup.sh<br/> * $VDT_LOCATION/vdt/setup/configure_xrootd --server y --xrdr-host valtical.cern.ch --storage-path /localdisk/xrootd <br/> --storage-cache /localdisk/xrootd --enable-security --set-data-server-xrd-port 1094 --user xrootd * edit the xrootd.cfg file located at /opt/vdt-xrootd/xrootd/etc/xrootd.cfg 7. Start the xrootd service <br/> * vdt-control --non-root --on<br/> ---+++++ EOS client installation 1. yum -y install xrootd-libs zlib readline ncurses openssl e2fsprogs-libs<br/> 2. rpm -ivh /afs/cern.ch/project/eos/rpms/slc-5-x86_64/eos-client-0.1.0-rc37.x86_64.rpm<br/> 3. yum -y install xrootd-client fuse-libs<br/> 4. rpm -ivh /afs/cern.ch/project/eos/rpms/slc-5-x86_64/eos-fuse-0.1.0-rc37.x86_64.rpm<br/> 5. export EOS_MGM_URL=root://eosatlas.cern.ch<br/> ---+++++ XROOTDFS Installation The system should have FUSE installed.FUSE can be installed using 'yum install fuse fuse-libs'. 1.Setup Pacman, make an installation directory and assign its path to $INSTALL_DIR. Install XrootdFS from the http://software.grid.iu.edu/osg-1.2 cache.<br/> * export INSTALL_DIR=/path_to_xrootdfs_installation_directory * mkdir -p $INSTALL_DIR * cd $INSTALL_DIR * pacman -get http://software.grid.iu.edu/osg-1.2:XrootdFS 2.Configure XrootdFS<br/> * cd $INSTALL_DIR * source setup.sh * $VDT_LOCATION/vdt/setup/configure_xrootdfs --user xrootd --cache /xrootdfs/localdisk/xrootd --xrdr-host valtical.cern.ch --xrdr-storage-path /localdisk/xrootd ---+++++ XROOTD Troubleshooting Q. Error : Access to cluster files and directories via /xrootdfs/localdisk/xrootd fails <br/> Eg : Command ls /xrootdfs/localdisk/xrootd gives error:Transport endpoint is not connected. A. Check whether the partition /xrootdfs/localdisk/xrootd is mounted properly.If not, <br/> Try stopping and starting xrootdfsd daemon as explained in the previous section.<br/> If stopping xrootdfsd does not succeed, <br/> Stop the xrootd processes <br/> Unmount the partition using 'umount -l /xrootdfs/localdisk/xrootd'<br/> Start xrootd processes <br/> Start the xrootdfsd daemon <br/> Q. Error in listing contents of the directories corresponding to xrootdfs in valtical.cern.ch A. This error happens when xrootd - cns daemon is no longer running on valtical.cern.ch. Check whether the xrootd - cns process is running on valtical.cern.ch.<br/> Q. Error : Unable to write to xrootd cluster. Error message :Last server error 3011 ('No servers are available to write the file.') A. Check whether there is enough disk space available in the xrootd cluster. Q. Error : Xrootd runs on the redirector and data servers.But there is no communication between the redirector and data server. A. Add rules to iptables to accept incoming tcp connections from xrootd. Q. Error: Connection timeout A. Make sure xrootdfs is correctly configured in /opt/vdt-xrootdfs/xrootdfs/bin/start.sh ---++++ CONDOR Central Manager : Valtical00 <br/> Deamons running : Master,Negotiator,Collector,Procd Submit Machine : Valtical00<br/> Deamons running : Master,Schedd,Procd Execute Machine :Valtical04 - Valtical09<br/> Deamons running : Master,Startd,Procd *CONDOR CONFIGURATION* Global configuration file : /opt/condor-7.5.0.bak/etc/condor/condor_config<br/> Local configuration file : //opt/condor-7.5.0.bak/etc/condor/condor_config.local<br/> The configuration values can be obtained by querying using 'condor_config_val -v variable' command or by referring the config files<br/> The config values set in the global config file is overridden by the values in the local config file. The condor_reconfig command can be used for configuration changes to take effect after configuration file changes.<br/> The changes to some configuration parameters(eg : DAEMON_LIST) take effect only on restarting the master daemon using command : condor_restart<br/> valtical00 is set to 1 MAX_NUM_CPUS as it is the master of the batch system valticalXX are set to 12 MAX_NUM_CPUS in order to release the load during peak periods. *CONDOR INSTALLATION and TESTING* %RED% In such way for valtical00,04,05,06,07,08,09%ENDCOLOR% %BR% Note: "root" user must be used to perform installation 1. stop all the running condor daemons <br/> * service condor stop<br/> * pkill -f condor<br/> 2. find and remove the old condor rpm package <br/> * rpm -qa | grep condor<br/> * rpm -e old_condor_version<br/> 3. make a back up of previous package, in this way the *.rpmsave file will also be invalidated. <br/> * mv /opt/condor-7.5.0 /opt/condor-7.5.0.bak<br/> 4.Download and install the YUM repository file that matches your operating systems<br/> Installing latest condor for Redhat 5 <br/> * cd /etc/yum.repos.d<br/> * rm condor-stable-rhel5.repo<br/> * wget http://www.cs.wisc.edu/condor/yum/repo.d/condor-stable-rhel5.repo<br/> 5.Install Yum's downloadonly module in order to download RPM from repository<br/> * yum install yum-downloadonly<br/> 6.Download Condor RPM from yum repository to temporary folder<br/> Installation for 64 bit machine<br/> * yum install condor.x86_64 --downloadonly --downloaddir=/tmp <br/> 7.Install RPM into /opt/condor-7.5.0 folder<br/> * rpm -ivh /tmp/condor-7.5.1-1.rhel5.i386.rpm \ <br/> --relocate /usr=/opt/condor-7.5.0/usr \ <br/> --relocate /var=/opt/condor-7.5.0/var \ <br/> --relocate /etc=/opt/condor-7.5.0/etc * if the previous command failed due to missing rpm sources, install them <br/> * yum install libvirt<br/> * yum install perl-XML-Simple-2.14-4.fc6.noarch<br/> 8.Edit Condor's configuration files condor_config.local,condor_config located <br/> and paths in condor start script.<br/> * cd /opt/condor-7.5.0/etc/condor/<br/> * cp /opt/condor-7.5.0.bak/etc/condor/condor_config ./<br/> * cp /opt/condor-7.5.0.bak/etc/condor/condor_config.local ./<br/> 9. Add the following 2 lines to /etc/profile.d/condor.sh if necessary and then source the .sh file<br/> * export PATH=${PATH}:/opt/condor-7.5.0/usr/bin:/opt/condor-7.5.0/usr/sbin:/sbin<br/> * export CONDOR_CONFIG=/opt/condor-7.5.0/etc/condor/condor_config <br/> 10 .Start Condor daemons <br/> * service condor start<br/> 11. Submit one condor job to each WN to test if it works <br/> * cd /data6/qing/condor_test<br/> * source submit.sh<br/> Refer : http://www.cs.wisc.edu/condor/yum/#relocatable *DIRECTORY PATHS* Command 'condor_config_val' can be used to obtain configured values.<br/> Use 'condor_config_val -v variable ' to get the paths of the important directories |*Directories* | *Variable name* | |logs | LOG | |binaries | RELEASE_DIR | |local directory | LOCAL_DIR | |local configuration file | LOCAL_CONFIG_FILE | |lock | LOCK | *COMMANDS TO MANAGE JOBS* Refer : http://www.cs.wisc.edu/condor/manual/v7.4/9_Command_Reference.html condor_q : To get the status of all queued jobs.<br/> options :- <br/> '-better-analyze' or '-analyze' to diagnose problems with jobs.<br/> '-submitter ' to get condor jobs corresponding to a user.<br/> Eg : condor_q -submitter "name" -better-analyze<br/> condor_status: To monitor and query the condor pool for resource information, submitter information,<br/> checkpoint server information, and daemon master information condor_rm : Removes jobs from the pool<br/> Eg : condor_rm user - removes jobs submitted by the user<br/> Eg : condor_rm cluster.process : removes the specific job condor_prio : To change priority of a user's job.The priority can be changed only by job owner or root. condor_userprio : To change a user's priority.The priority can be changed only by root. *COMMANDS TO MANAGE CONDOR DAEMONS* The DAEMON_LIST config variable lists all the condor daemons to be started on a machine.<br/> The condor_master starts all the other daemons mentioned in the DAEMON_LIST configuration variable.<br/> Condor daemons can be started using following commands. %RED% Only true for valtical07,08,09%ENDCOLOR% condor_master / condor_master_off<br/> service condor start/service condor stop For the other machines use ==kill condor_master ;; rc.d/rc.local== Condor daemons can be started using condor_on command.This command works only if the Master daemon is running<br/> If a daemon other than the condor_master is specified with the -subsystem option, condor_on starts up only that daemon<br/> Eg: condor_on -subsystem master : Starts the daemons listed in the DAEMON_LIST configuration variable<br/> Eg: condor_on -subsystem schedd : Starts Schedd daemon Condor daemons can be shut down using condor_off command <br/> Eg: condor_off : Shuts down all daemons except master.<br/> Eg:condor_off -subsystem master : Shuts down all daemons including the condor_master.<br/> Specification using the -subsystem option will shut down only the specified daemon. *CURRENT PRE-EMPTION POLICY* ( Claim preemption is enabled) PREEMPTION_REQUIREMENT : True <br/> When considering user priorities, the negotiator will not preempt a job running on a given machine unless the PREEMPTION_REQUIREMENTS <br/> expression evaluates to True and the owner of the idle job has a better priority than the owner of the running job CLAIM_WORKLIFE: 1200<br/> If provided, this expression specifies the number of seconds after which a claim will stop accepting additional jobs NEGOTIATOR_CONSIDER_PREEMPTION: True *MANAGING LOG FILES* Log Files: {SUBSYS}_LOG - The name of the log file for a given subsystem(Collector,Negotiator,Schedd,Startd,Starter,Shadow,Master),located in LOG directory<br/> Log rotation happens when log size exceeds 1M.Condor stores only a single rotated file for each subsystem.Therefore space required for logs corresponding <br/> to a subsystem is 2M. History Files: Located in SPOOL Directory.<br/> History Log rotation happens when log size exceeds 2M.2 rotated files are stored in the SPOOL dir making the total space required for history logs 6M. *TROUBLESHOOTING* condor_q -analyze : The -better-analyze option may be used to determine why certain jobs are not running by performing an analysis on<br/> a per machine basis for each machine in the pool.<br/> condor_q -better-analyze : This option can be used for a thorough analysis.<br/> Further, Logs mentioned above can be checked to figure out the problem. ---++++ PROOF Master :Valtical <br/> Slaves :Valtical04,Valtical05,Valtical06,Valtical07,Valtical08,Valtical09<br/> *Valtical00* is not part of the proof nodes on purpose, to avoid overloading the machine, that is the main NFS and XROOTD server. *PROOF CONFIGURATION* Configuration file locations : /opt/root/etc/xrootd.cfg, /opt/root/etc/proof/proof.conf *HOW TO RESTART PROOFD* If you want to restart PROOFD on all the nodes, you can use the cluster_do.py script written by L. Fiorini:%BR% <verbatim> /afs/cern.ch/user/l/lfiorini/public/cluster_do.py -n valtical,valtical04,valtical05,valtical06,valtical07,valtical08,valtical09 -c "service proofd restart" Executing Command: service proofd restart on node valtical on node valtical04 on node valtical05 on node valtical06 on node valtical07 on node valtical08 on node valtical09 as user: root >>>>> Are you sure? [Yy/Nn] <<<<< y >>> exec on node valtical <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] >>> exec on node valtical04 <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] >>> exec on node valtical05 <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] >>> exec on node valtical06 <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] >>> exec on node valtical07 <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] >>> exec on node valtical08 <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] >>> exec on node valtical09 <<< Stopping xproofd: [ OK ] Starting xproofd: [ OK ] End </verbatim> *PROOF Installation* 1.Location independent installation of ROOT from source * Get the sources of the latest ROOT<br/> wget ftp://root.cern.ch/root/root_v5.28.00c.source.tar.gz<br/> gzip -dc root_<version>.source.tar.gz | tar -xf -<br/> * Build ROOT cd root<br/> ./configure --help<br/> ./configure [<arch>] [set arch appropriately if no proper default]<br/> (g)make [or, make -j n for n core machines]<br/> * Add bin/ to PATH and lib/ to LD_LIBRARY_PATH. For the sh shell family do:<br/> . bin/thisroot.sh * Try running ROOT:<br/> root Refer: http://root.cern.ch/drupal/content/installing-root-source 2.Create PROOF configuration files to /opt/root/etc/xrootd.cfg and /opt/root/etc/proof/proof.conf 3.Create PROOF service start/stop scripts and copy to /etc/init.d/proofd 4.Start PROOF by issuing service proofd start/stop Further inputs on creating scripts and configuration file refer : http://root.cern.ch/drupal/content/standard-proof-installation-cluster-machines *LOG FILES* The log entries are made into xrootd log file at /opt/root/var/logs/xrootd.log *Starting/Stopping PROOF* PROOF can be started/stopped by issuing 'service proofd start' / 'service proofd stop' ---+++ check the temperature of cpu cores * yum -y install lm_sensors * sensors-detect * sensors ---+++ Large scale file deletion in xrootd system * put the list of directories to be deleted in /data6/qing/file_deletion * python scan.py * python delete.py * source delete.sh ---+++ How to download datasets from grid to CERN * cd /afs/cern.ch/user/q/qing/grid2CERN_lcg/data * put the name of datasets in list_data.txt * python lcg_data.py to create all.txt * python split.py and then run cp1.sh, cp2.sh,cp3.sh,cp4.sh,cp5.sh on 5 different valtical machines * source /work/users/qing/data5/qing/scan_cluster/scan.sh to create a new list of xrootd files * python lcg_check.py to create download_missing.sh * source download_missing.sh on one or more machines. --+++ How to check if downloaded files are broken * cd /data6/qing/broken * source setup.sh (Juan's working environment to test his ntuples) * cat /work/users/qing/data5/qing/ForUsers/all_xrootd_files.txt | grep 'root://valtical.cern.ch//localdisk/xrootd/users/qing/data12_8TeV/SMDILEP_p1328_p1329/user.qing.data12_8TeV.periodH.physics_Muons.PhysCont.NTUP_SMWZ.grp14_v01_p1328_p1329_2LepSkim_v2' > MuonH.txt * python create.py MuonH.txt > 1.sh * source 1.sh * python find_bad.py ---+++ How to transfer datasets from CERN to IFIC * cd /afs/cern.ch/user/q/qing/CERN2IFIC/ * put the xrootd paths into list_data.txt * python makelist.py to create list_files.txt * python transfer_list.py to create files_to_transfer.txt * source dq2_setup.sh to set up the enviroment * source files_to_transfer.txt to start the transfer * scp list_files.txt qing@ticalui01.uv.es:./private/data_transfer/list_files.txt * open a new terminal and log ticalui01, cd ~qing/private/data_transfer * python checklist.py * scp qing@ticalui01.uv.es:./private/data_transfer/lustre_file.txt ./ * python transfer_missing.py * source files_to_transfer.txt ---+++ How to setup the free version of NX server which supports multi-users and multi-logins: * NX server: [[%ATTACHURL%/https://twiki.ific.uv.es/twiki/bin/viewfile/Atlas/IFICT3Cluster?rev=1;filename=NXserver_setup.sh][NXserver_setup.sh]] * Install NX client: * wget http://64.34.173.142/download/3.5.0/Linux/nxclient-3.5.0-7.x86_64.rpm * rpm -Uvh nxclient-3.5.0-7.x86_64.rpm * Connection from outside of CERN network needs a tunnel: * python [[%ATTACHURL%/tunnel.sh][tunnel.sh]], log on lxplus with your CERN NICE account and password. * In NX client configuration, set Host as localhost, and port as 10001, and then log in with your NICE account and passwd. * Machines supports NX connection: * valticalui01 ---+++ Database setup at valtical * yum -y install mysql-server mysql php-mysql, MySQL-python; * /sbin/service mysqld start * /sbin/chkconfig mysqld on * mysql; create database xrootd; use xrootd; create corresponding table; * grant select,insert on xrootd.* to 'xrootd'@'localhost'; * grant all on *.* to xrootd@'137.138.40.184'; * grant all on *.* to xrootd@'137.138.40.143'; * grant all on *.* to xrootd@'137.138.40.190'; * grant all on *.* to xrootd@'137.138.40.186'; * grant all on *.* to xrootd@'137.138.40.165'; * grant all on *.* to xrootd@'137.138.40.181'; * grant all on *.* to xrootd@'137.138.40.166'; * grant all on *.* to xrootd@'137.138.40.140'; * grant all on *.* to xrootd@'137.138.40.173'; ---+++ Pyroot setup at valtical00: * install python * cd /work/users/qing/software * mkdir python2.4 * wget http://www.python.org/ftp/python/2.4.6/Python-2.4.6.tgz * tar -xvzf Python-2.4.6.tgz * cd Python-2.4.6 * ./configure --enable-shared --prefix="/work/users/qing/software/python2.4" * gmake * gmake install * install pyroot * cd /work/users/qing/software * mkdir root5.28 * wget ftp://root.cern.ch/root/root_v5.28.00b.source.tar.gz * tar -xvzf root_v5.28.00b.source.tar.gz * cd root * ./configure --with-python-incdir=/work/users/qing/software/python2.4/include/python2.4 --with-python-libdir=/work/users/qing/software/python2.4/lib --prefix="/work/users/qing/software/root/root5.28" --etcdir="/work/users/qing/software/root/root5.28/etc" * gmake * gmake install * Environment setup before using ROOT: * export ROOTSYS=/work/users/qing/software/root * export PATH=/work/users/qing/software/python2.4/bin:$ROOTSYS/bin:$PATH * export LD_LIBRARY_PATH=$ROOTSYS/lib:/work/users/qing/software/python2.4/lib:$LD_LIBRARY_PATH * export PYTHONPATH=$PYTHONPATH:$ROOTSYS/lib ---+++ xrootd installation on slc6: * https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallXrootd * rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm * yum -y install yum-priorities * rpm -Uvh http://repo.grid.iu.edu/osg-el6-release-latest.rpm * yum install xrootd.x86_64 * yum install libXpm-devel xrootd-private-devel xrootd-client-devel xrootd-libs-devel xrootd-client * Xrootd data server using configuration file root@valtical06:/etc/xrootd/xrootd-clustered.cfg ---+++ proof installation on slc6: * install ROOT under /opt/, combining with xrootd and python * yum install xrootd-private-devel (needed for root_v5.34.07, newest xrootd-devel (version 3.3) RPMs for Fedora/RedHat/SL do not include that header file anymore, so to install the head files we need to install xrootd-private-devel) * ./configure --with-xrootd-incdir=/usr/include/xrootd/,/usr/include/xrootd/private/ --with-xrootd-libdir=/usr/lib64/ --with-python-incdir/usr/include/python2.6/ --with-python-libdir /usr/lib64 * create /etc/init.d/proofd * configure /opt/root/etc/xproofd.cfg and /opt/root/etc/proof/proof.conf using template from valtical06. ---+++ condor installation on slc6: * remove old condor processes and rpms * cd /etc/yum.repos.d/ * wget http://research.cs.wisc.edu/htcondor/yum/repo.d/condor-stable-rhel6.repo * yum install xrootd.x86_64 * WN use configuration at root@valtical06:/etc/condor/config.d/00personal_condor.config. ---+++ condor debug: * In condor_config, add 'TOOL_DEBUG = D_ALL,SUBMIT_DEBUG = D_ALL,SCHEDD_DEBUG = D_ALL' ---+++ install 32 bit libraries on slc6: * yum install libaio.so.1, libcrypto.so.6 ---+++ install display on valticalui01 * http://www.imagemagick.org/script/install-source.php ---+++ install missing lib for grid env: * yum -y install openssl098e * yum -y install compat-expat1-1.95.8-8.el6.i686 * yum -y install expat-devel.x86_64 * yum -y install expat-devel.i686 * ln -s /lib64/libexpat.so.1.5.2 /usr/lib64/libexpat.so.0 * yum install compat-openldap-2.3.43-2.el6.x86_64 ---+++ Current TODO List * Have stable xrootd and xrootdfs services * Propagate condor installation from valtical07,08,09 to valtical00, valtical04,valtical05 * Get reliable data storage information in xrootd * Finish installation of the new 1.8 TB disks * De/Recommission valtical15 * ---+++ Maintenance tasks ---++++ On request * Install new packages in all computers * Add new users * Change user default settings * Remove old users * Check nfs, xrootd, condor status ---++++ Daily * Check and kill for zombie processes. * Check CPU and memory consumption. * Free cached memory. * Check SAM performance and Ganglia status ---++++ Weekly * Check for package upgrades (No condor updates for the moment) * Check disk space status * Warn users using considerable amount of disk space * Help users migrate data from NFS to xrootd * Check /var/log/messages for SMART message of disk problem ---++++ Monthly * Reboot machines in the cluster <!-- Hasta aquí --> -- Main.avalero - 13 Sep 2010
V
iew topic
|
Edit
|
|
WYSIWYG
|
Attach
|
P
DF
|
V
iew topic
|
Backlinks:
We
b
,
A
l
l Webs
|
H
istory
:
r69
<
r68
<
r67
<
r66
<
r65
|
More topic actions...
Atlas
Atlas Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Antares_KM3Net
Artemisa
Atlas
Cosmologia
DefaultWeb
ECiencia
FisicaMedica
Flavianet
HGRF
Hades
IFIMED
ILC
Informatica
LHCPhenonet
LHCpheno
Main
NEXT
Neutrinos
PARSIFAL
Sandbox
TWiki
WelcomePackage
Workshops
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback