r27 - 25 Nov 2013 - 10:09:36 - Main.qingYou are here: TWiki >  Atlas Web  >  TileCal > IFICT3Cluster

IFIC Tier3 Computing Cluster

Link to CERN Computing Cluster


Link to User Questioning Area


The IFIC Tier3 Computing Cluster

The IFIC Tier3 computing cluster at IFIC is located in the IFIC cluster room. IFIC AFS account is required to access the 2 UIs remotely.

local > ssh -X login@ticalui01.uv.es or login@ticalui01.ific.uv.es
local > ssh -X login@ticalui02.uv.es or login@ticalui02.ific.uv.es


Computer Activity Cores Mem Local Disk OS kernel version NFS server User Quota
ticalui01 UI & Condor Submit Machine 8 48GB 750 GB SLC5.6 2.6.18-128.7.1.el5_lustre. /work 20GB
ticalui02 UI & Condor Submit Machine 8 48GB 750GB SLC5.6 2.6.18-128.7.1.el5_lustre. /data2 20GB

User Space

Directory Total size User default Quota NFS server Support User list
/work 540 GB 20GB ticalui01 qing,fiorini,solans,avalero,valls,samarsan,yeherji,leoceral,march
/scratch0 460 GB 40GB ticalui01 fiorini,solans,avalero,valls
/scratch1 460 GB 40GB ticalui01 qing,samarsan,yeherji,leoceral,march
/data2 540 GB 20GB ticalui02 qing,fiorini,solans,avalero,valls,samarsan,yeherji,leoceral,march

Cluster topology

  • /work physically mounted on ticalui01 and /data2 physically mounted on ticalui02, by default each user has 20GB in /work and /data2 separately.

How to use root

  • source /afs/ific.uv.es/user/q/qing/software/bin/thisroot.sh
  • source /data2/software/root/bin/thisroot.sh

How to use python2.6

  • export PATH=/work/software/python2.6/bin:$PATH
  • export LD_LIBRARY_PATH=/work/software/python2.6/lib:$LD_LIBRARY_PATH
  • export PYTHONPATH=/work/software/python2.6

UI Setup for SLC5

Condor setup

  • ticalui01: COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD; this is the machine for users to submit jobs
  • ticalui02: MASTER, STARTD; this is the machine working as WNs

Skimming job Example using condor

  • log on ticalui01.ific.uv.es and cd /work/offline/condor_test; condor_submit ticalui02.job; # The output file is /work/offline/condor_testResults/data11_177986_Egamma/Data11_0jets_ticalui02.root
    • /work/offline/condor_test/ticalui02.job: condor job configuration file
    • /work/offline/condor_test/Results/data11_177986_Egamma/HWWrel16: Executable file compiled from /work/offline/tile/yesenia/Analysis/compileHWW.sh
    • /work/offline/condor_test/Results/data11_177986_Egamma/list/ticalui02.txt: path of lustre file to be analyzed
  • /work/offline/tile/yesenia/Analysis/submitCondor_cern/run_jobs shows you how to create multiple condor jobs.

UI Setup for SLC6

  • lustre setup:
yum -y update -x kernel -x kernel-devel -x kernel-headers
# lustre kernel
yum install -y kernel-2.6.32-358.14.1.el6 kernel-devel-2.6.32-358.14.1.el6 kernel-headers-2.6.32-358.14.1.el6 net-snmp-libs > /dev/console 2>&1
/sbin/new-kernel-pkg --make-default --install 2.6.32-358.14.1.el6.x86_64 > /dev/console 2>&1
/sbin/new-kernel-pkg --mkinitrd --dracut --depmod --update 2.6.32-358.14.1.el6.x86_64 > /dev/console 2>&1

/bin/rpm -Uhv $RPM_PROXY \
   http://alpha2.ific.uv.es/linux/ific/03/el6/x86_64/RPMS/lustre-client-1.8.9-wc1_2.6.32_358.14.1.el6.x86_64.x86_64.rpm \
   http://alpha2.ific.uv.es/linux/ific/03/el6/x86_64/RPMS/lustre-client-modules-1.8.9-wc1_2.6.32_358.14.1.el6.x86_64.x86_64.rpm \
   > /dev/console 2>&1

echo "mgs01.ific.uv.es@tcp:/ificfs /lustre/ific.uv.es/ lustre ro,_netdev,user_xattr 0 0" >> /etc/fstab
echo "mgs01.ific.uv.es@tcp:/t3fs   /lustre/ific.uv.es/grid/atlas/t3 lustre ro,_netdev,user_xattr 0 0" >> /etc/fstab
mkdir -p /lustre/ific.uv.es/
  • the other setup are similar as in slc5

Raid recovery with /sda or /sdb on ticalui01 and ticalui02 under SLC6

  • Prepare a new disk, format it to have same partition as the system disk , then do the following cleaning:
    • dd if=/dev/zero of=/dev/sdh seek=1465148000 count=10000
    • dd if=/dev/zero of=/dev/sdh2 count=100
    • dd if=/dev/zero of=/dev/sdh1
    • 2002 dd if=/dev/zero of=/dev/sdh count=100

  • when a disk fails, pull it out, wait > 10seconds, then insert a new disk, then copy the first 512Bytes to the new disk, for example , if /dev/sdb is the new disk:
    • dd if=/dev/sda of=/dev/sdb count=512
  • Add the new disk into raid
    • mdadm --manage /dev/md0 --add /dev/sdb1
    • mdadm --manage /dev/md1 --add /dev/sdb2

Raid recovery with /sda or /sdb on ticalui01 and ticalui02 under SLC5

  • make sure that /dev/sda and /dev/sdb has same results in 'fdisk -l' and 'parted;print', if not, fdisk the disk with option 't;1;fd;t;2;fd;a;1', otherwise reboot could fail with one disk;
  • Install grub on /dev/sda and /dev/sdb so that system could boot from any disk, firstly enter grub environment, then:
    • device (hd0) /dev/sda
    • root (hd0,0)
    • setup (hd0)
    • device (hd0) /dev/sdb
    • root (hd0,0)
    • setup (hd0)
  • remove /sdb1 and /sdb2 from md0 and md1
    • mdadm --manage /dev/md0 --fail /dev/sdb1
    • mdadm --manage /dev/md1 --fail /dev/sdb2
    • mdadm --manage /dev/md0 --remove /dev/sdb1
    • mdadm --manage /dev/md1 --remove /dev/sdb2
  • check the id of /sdb
    • dmesg | grep Attached or cat /var/log/dmesg | grep Attached
  • remove disk /sdb from the system
    • echo "scsi remove-single-device 1 0 0 0" > /proc/scsi/scsi
  • pull out /sdb and then insert a new disk,
  • Add the new disk driver to the the system,
    • echo "scsi add-single-device 2 0 0 0" > /proc/scsi/scsi, if they are mapped to sdc, sdd instead of sdb, try to reboot
    • make a new /sdb1 and /sdb2 with size = the previous two partitions (first one is 251MiB, format is ext3)
  • Add the 2 new partitions to md0 and md1
    • mdadm --manage /dev/md0 --add /dev/sde1
    • mdadm --manage /dev/md1 --add /dev/sde2
  • Rebuild will start and takes ~ 6 hours to finished, check the status recorded at /proc/mdstat

Installing printing machine at IFIC

  • copy the directory /etc/cups from ticalui01 or any machine which has the printers installed
  • service cups start

Permission denied problem in logging on machines via AFS

  • lcm --configure krb5clt afsclt srvtab (How to recreate keytab file at CERN)

Can log in the machine but can't write into AFS:

  • lcm --reconfigure all
  • kdestroy; recreate krb5.keytab.linux krb5.keytab.windows
  • check if /tmp is full

forms for new user registration:

Large file deletion in the ATLASLOCALDISK area:

  • cd /afs/cern.ch/user/q/qing/cern2ific/del_at_ific
  • source dq2_setup.sh
  • fill list.txt with srm path to be deleted
  • python del.py

How to transfer files from CERN cluster to IFIC

  • log in valtical00 and "cd /afs/cern.ch/user/q/qing/CERN2IFIC", fill list_data.txt with the dataset names, then 'python makelist.py' to create list_files.txt
  • scp list_files.txt qing@ticalui02.ific.uv.es:./private/data_transfer/
  • open a new terminal and log on ticalui02, 'cd /afs/ific.uv.es/user/q/qing/private/data_transfer', then 'python checklist.py' to create lustre_file.txt
  • go to valtical00 and "scp qing@ticalui02.ific.uv.es:./private/data_transfer/lustre_file.txt ./"; python transfer_list.py to create files_to_transfer.txt
  • python multi_transfer.py

How to setup the free version of NX server which supports multi-users and multi-logins:

How to format a 3TB disk (http://www.cyberciti.biz/tips/fdisk-unable-to-create-partition-greater-2tb.html):

  • fdisk -l /dev/sdb
  • parted /dev/sdb
  • (parted) mklabel gpt
  • (parted) unit TB
  • mkpart primary 0.00TB 3.00TB
  • print
  • quit
  • mkfs.ext3 /dev/sdb1

-- Main.avalero - 13 Sep 2010

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
elseks ticalui_LVM.ks manage 3.8 K 20 Apr 2012 - 09:38 Main.qing KickStart? configuration file to install slc5.6 on ticalui with LVM
elseks ticalui_nonLVM.ks manage 4.2 K 20 Apr 2012 - 09:49 Main.qing  
shsh UI_setup.sh manage 5.8 K 20 Apr 2012 - 09:59 Main.qing  
shsh NXserver_setup.sh manage 1.1 K 08 Sep 2012 - 15:48 Main.qing  
shsh tunnel.sh manage 0.4 K 08 Sep 2012 - 15:50 Main.qing  
Edit | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r27 < r26 < r25 < r24 < r23 | More topic actions
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback