r2 - 19 Jun 2019 - 08:44:55 - JavierSanchezYou are here: TWiki >  Artemisa Web  > JobManagementSystem

Job Management System

Job management system

HTCondor is the resource management system that runs in this cluster. It manages the job workflow and allows the users to send jobs to be executed in the worker nodes. Direct access to worker nodes is not allowed.

Each worker node has a partitionable slot that accepts jobs to be processed. HTCondor deals with job sorting and processing. Slots are divided when the job does not require all node resources, so more jobs can be run in the node. CPU and Memory resources are subtracted in chunks from the main slot. 0, 1, 2 or 4 GPU requests are permitted.

slot #cpus (cores) Mem (MB) *#gpus chuck reservation time limit comments
slot1@mlwn01.ific.uv.es 8 16384 0 8 cores, 16384 MB 5 minutes cpu short jobs
slot2@mlwn01.ific.uv.es 56 134964 0 8 cores, 16384 MB 24 hours cpu jobs
slot3@mlwn01.ific.uv.es 32 231368 1 8 cores, 32768 MB 24 hours GPU jobs
slot1@mlwn02.ific.uv.es 8 16384 0 8 cores, 16384 MB 5 minutes cpu short jobs
slot2@mlwn02.ific.uv.es 56 134964 0 8 cores, 16384 MB 24 hours cpu jobs
slot3@mlwn02.ific.uv.es 32 231368 1 8 cores, 32768 MB 24 hours GPU jobs
slot1@mlwn03.ific.uv.es 56 309074 0 8 cores, 32768 MB 24 hours cpu jobs
slot2@mlwn03.ific.uv.es 56 463611 4 8 cores, 32768 MB 24 hours GPU jobs

HTCondor tries to run jobs form different users in a fair share way. Jobs priorities among users take into account the previous time spent by the user so CPU time is assigned evenly between all users.

A quick usage reference for HTCondor can be found here?

-- - -- Last update: 19 Jun 2019

Edit | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback