User Tools

Site Tools


qcg-pilotjobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
qcg-pilotjobs [2019/01/22 08:40]
bbosak@man.poznan.pl [QCG Pilot Job User Guide]
qcg-pilotjobs [2019/01/31 11:47] (current)
bbosak@man.poznan.pl
Line 1: Line 1:
-====== !!!! DRAFT !!!! ====== 
  
-====== QCG Pilot Job User Guide ====== 
  
-This webpage provides basic instruction how to use QCG Pilot Job mechanism in VECMA environment. For the reference documentation of the tool please go to [[ https://​github.com/​compat-project/​QCG-PilotJob/​blob/​master/​README.md | this link ]]+====== ​QCG PilotJob ​User Guide ======
  
-===== What is QCG Pilot Job? ===== +This webpage provides basic instruction how to use the [[ https://​github.com/​vecma-project/​QCG-PilotJob | QCG PilotJob ]] tool in the VECMA environmentFor the complete reference documentation ​of the tool please go to [[ https://​github.com/​vecma-project/​QCG-PilotJob/​blob/​master/​README.md | this link ]]
-The QCG Pilot Job is a computing job playing a role of a container over a number of subordinate computing jobs. It allows to execute many subordinate jobs in a single scheduling system allocationDirect submission of a large group of jobs to a scheduling system can result in long aggregated time to finish as each single job is scheduled independently and waits in a queue. On the other hand the submission ​of a group of jobs can be restricted or even forbidden by administrative policies defined on clusters. One can argue that there are available job array mechanisms in many systems, however ​the traditional job array mechanism allows ​to run only bunch of jobs having the same resource requirements while jobs being parts of a multiscale simulation by nature vary in requirements and therefore need more flexible solutionsOn a technical level QCG Pilot Job is exposed as a lightweight service ​QCG Pilot Job Manager +
  
-===== How to use QCG Pilot Jobs ? ===== +===== What is QCG PilotJob? ===== 
-QCG Pilot Job Manager is implemented in Python and as so may be utilised ​with the pure Python interpreter. Typically it is started on computing nodes utilising a small part of allocation. This may be quite tedious to run by hand, thus, for user convenience,​ QCG Pilot Job Manager has been integrated with the QCG middleware. Currently it may be easily used with the help of QCG-Client tool, what we recommend and describe later. ​+QCG PilotJob is a computing job playing a role of a container over a number of subordinate computing jobs. It allows to execute many subordinate jobs in a single scheduling system allocation. Direct submission of a number separate jobs to a scheduling system can result in long aggregated time to finish as each single job is scheduled independently and waits in a queue. Moreover the submission of a numerous of jobs can be restricted or even forbidden by administrative policies defined on clusters. One can argue that there are available job array mechanisms in many systems, however the traditional job array mechanism allows to run only bunch of jobs having the same resource requirements while jobs being parts of a multiscale simulation by nature vary in requirements and therefore need more flexible solutions. On a technical level QCG PilotJob is exposed as a lightweight service - QCG PilotJob Manager. ​  
 + 
 +===== How to use QCG PilotJob ​? ===== 
 +QCG PilotJob ​Manager is implemented in Python and as so may be run with the pure Python interpreter. Typically it is started on computing nodes utilising a small part of allocation. This may be quite tedious to run it by hand, thus, for user convenience,​ QCG PilotJob ​Manager has been integrated with the QCG middleware. Currently it may be easily used with the help of [[ http://​www.qoscosgrid.org/​trac/​qcg-broker/​wiki/​client_user_guide | QCG-Client tool ]], what we recommend and describe later. ​
        
-QCG Pilot Job Manager offers two basic modes of its usage: static and dynamic. In both modes a user starts QCG Pilot Job Manager service, however while in the static mode the way of execution of subordinate jobs is known in advance, in the dynamic mode, the subordinate jobs are added to the QCG Pilot Job Manager service on-demand programmatically,​ using predefined network API (e.g. from the python program).+QCG PilotJob ​Manager offers two basic modes of its usage: static and dynamic. In both modes a user starts ​the QCG PilotJob ​Manager service, however while in the static mode the way of execution of subordinate jobs is known in advance, in the dynamic mode, the subordinate jobs are added to the QCG PilotJob ​Manager service on-demand programmatically,​ using predefined network API (e.g. from the python program).
  
-At this moment QCG Pilot Job Manager has been deployed on the Eagle cluster in PSNC. There have been registered two QCG applications with the names qcg-pm and qcg-pm-client for execution of static and dynamic mode respectively. ​+At this moment QCG PilotJob ​Manager has been deployed on the Eagle cluster in PSNC. There have been registered two QCG applications with the names qcg-pm and qcg-pm-client for execution of static and dynamic mode respectively. ​
  
-In the ''/​home/​plgrid-groups/​plggvecma/​Common/​QCGPilotJob/​examples/''​ on ''​qcg.man.poznan.pl''​ machine there are available example files for the QCG Pilot Job system. In order to try them, please ssh to the machine:+In the ''/​home/​plgrid-groups/​plggvecma/​Common/​QCGPilotJob/​examples/''​ on ''​qcg.man.poznan.pl''​ machine there are available example files for the QCG PilotJob ​system. In order to try them, please ssh to the machine:
  
 ''​ssh plg*****@qcg.man.poznan.pl''​ ''​ssh plg*****@qcg.man.poznan.pl''​
Line 27: Line 27:
 In the next part of the tutorial we will assume that the copied example files are located in ''​~/​pj-examples''​ In the next part of the tutorial we will assume that the copied example files are located in ''​~/​pj-examples''​
  
-==== Static QCG Pilot Job ====+==== Static QCG PilotJob ​====
 Example input files: ''​~/​pj-examples/​static''​. Example input files: ''​~/​pj-examples/​static''​.
  
-To run QCG Pilot Job with QCG-Client in the static mode, only two following elements are needed: +To run QCG PilotJob ​with QCG-Client in the static mode, only two following elements are needed: 
-  - The QCG description file for starting the QCG Pilot Job Manager service +  - The QCG description file for starting the QCG PilotJob ​Manager service 
-  - The Pilot Job execution procedure being an input to the QCG Pilot Job Manager job.+  - The PilotJob ​execution procedure being an input to the QCG PilotJob ​Manager job.
  
 The QCG description file should be formatted in a typical for QCG jobs way. For the static execution, the application should be set to ''​qcg-pm''​. An example QCG description ''​example-static.qcg''​ file looks as follows:<​code>​ The QCG description file should be formatted in a typical for QCG jobs way. For the static execution, the application should be set to ''​qcg-pm''​. An example QCG description ''​example-static.qcg''​ file looks as follows:<​code>​
Line 47: Line 47:
 Here, we selected that our calculations should be performed on Eagle, using 4 cores, with walltime set to 10 minutes, and so on... We defined also that the output from a job, when it finishes, should be automatically downloaded to the ''​eagle.wd.${JOB_ID}''​ directory. ​ Here, we selected that our calculations should be performed on Eagle, using 4 cores, with walltime set to 10 minutes, and so on... We defined also that the output from a job, when it finishes, should be automatically downloaded to the ''​eagle.wd.${JOB_ID}''​ directory. ​
  
-An argument given to the ''​qcg-pm''​ application is the Pilot Job execution procedure constructed accordingly to the JSON-based [[https://​github.com/​compat-project/​QCG-PilotJob/​blob/​master/​README.md#​file-based-interface | QCG Pilot Job Manager file interface]]. In order to illustrate the basic capabilities of this interface, we uploaded the ''​example-static.json''​ file that is a simple workflow of two ''/​bin/​date''​ invocations separated by 10 second ''​sleep''​. Its content is presented below: <​code>​+An argument given to the ''​qcg-pm''​ application is the PilotJob ​execution procedure constructed accordingly to the JSON-based [[https://​github.com/​vecma-project/​QCG-PilotJob/​blob/​master/​README.md#​file-based-interface | QCG PilotJob ​Manager file interface]]. In order to illustrate the basic capabilities of this interface, we uploaded the ''​example-static.json''​ file that is a simple workflow of two ''/​bin/​date''​ invocations separated by 10 second ''​sleep''​. Its content is presented below: <​code>​
 [ [
 { {
Line 107: Line 107:
 </​code>​ </​code>​
  
-This is very basic scenario, but in a similar way QCG Pilot Job system supports definition of more advanced use cases, e.g scenarios including loops and/or parallel processing. ​+In this example, the first executed job will be ''​date1'',​ then ''​sleep''​ and then ''​date2''​. The thing to note is the fact that the order of specification of jobs is important when there are dependencies between tasks applied. Thus, the ''​sleep''​ job has to be defined after ''​date1''​ and ''​date2''​ after ''​sleep''​.  
 + 
 +This is very basic scenario, but in a similar way QCG PilotJob ​system supports definition of more advanced use cases, e.g scenarios including loops and/or parallel processing. ​
  
-Now, when there are defined all inputs to QCG Pilot Job, the job description can be submitted to QCG with ''​qcg-sub''​ command:+Now, when there are defined all inputs to QCG PilotJob, the job description can be submitted to QCG with ''​qcg-sub''​ command:
  
 ''​qcg-sub example-static.qcg''​ ''​qcg-sub example-static.qcg''​
  
-==== Dynamic QCG Pilot Job ====+==== Dynamic QCG PilotJob ​====
 Example input files: ''​~/​pj-examples/​dynamic''​. Example input files: ''​~/​pj-examples/​dynamic''​.
  
-QCG Pilot Job Manager provides ​API (described ​[[https://​github.com/​compat-project/​QCG-PilotJob/​blob/​master/​README.md#​api | here ]]that can be used to dynamically add/​delete/​manage its sub-jobs via the network interface. ​+QCG PilotJob ​Manager provides [[https://​github.com/​vecma-project/​QCG-PilotJob/​blob/​master/​README.md#​api | Python API ]] that can be used to dynamically add/​delete/​manage its sub-jobs via the network interface. ​
  
-In the base scenario, we can assume that QCG Pilot Job Manager and the program using API are executed in the same allocation. To easily start such scenarios there was developed a dedicated QCG application wrapper called ''​qcg-pm-client''​ that can be selected from QCG-Client tool. This time, instead of the execution procedure, a file with python code using API should be defined as an input argument. ​+In the base scenario, we can assume that QCG PilotJob ​Manager and the program using API are executed in the same allocation. To easily start such scenarios there was developed a dedicated QCG application wrapper called ''​qcg-pm-client''​ that can be selected from QCG-Client tool. This time, instead of the execution procedure, a file with python code using API should be defined as an input argument. ​
  
 Below we present a content of example ''​example-dynamic.qcg''​ file:<​code>​ Below we present a content of example ''​example-dynamic.qcg''​ file:<​code>​
Line 131: Line 133:
 </​code>​ </​code>​
  
-As you can see, the application is set to ''​qcg-pm-client''​ and the input argument points to a ''​example-dynamic.py''​ python file. From this python file, with support of ''​qcg.appscheduler.api''​ package, the communication with QCG Pilot Job Manager may be realised e.g. to submit new sub-jobs. Let us show an example code stored in the ''​example-dynamic.py''​ file. You can note that this code realizes the same workflow as was defined in the previously considered static execution<​code>​+As you can see, the application is set to ''​qcg-pm-client''​ and the input argument points to a ''​example-dynamic.py''​ python file. From this python file, with support of ''​qcg.appscheduler.api''​ package, the communication with QCG PilotJob ​Manager may be realised e.g. to submit new sub-jobs. Let us show an example code stored in the ''​example-dynamic.py''​ file. You can note that this code realizes the same workflow as was defined in the previously considered static execution<​code>​
 import zmq import zmq
  
Line 161: Line 163:
 </​code>​ </​code>​
  
-===== QCG Pilot Job Reference Documentation ===== +===== QCG PilotJob ​Reference Documentation ===== 
-In order to get complete documentation of a tool please go to [[ https://​github.com/​compat-project/​QCG-PilotJob/​blob/​master/​README.md | this link ]]+In order to get complete documentation of a tool please go to [[ https://​github.com/​vecma-project/​QCG-PilotJob/​blob/​master/​README.md | this link ]]
  
  
qcg-pilotjobs.1548146417.txt.gz · Last modified: 2019/01/22 08:40 by bbosak@man.poznan.pl