CondorCycleSeeder

The name chosen for the pool (CONDOR_HOST) is uct3-mgt.mwt2.org. This implies that by default the collector/negotiator will advertise that hostname (not resolvable outside) and go out with the private IP (10.1.3.95) not routable outside. This is not a problem at the moment since all the hosts intended to flock into the cluster are inside that 10.x.x.x subnet. To change that see the section XXX

uc3-mgt install & config

manager configurations

Files:
  • /etc/condor/condor_config is the one configured for the MWT2
  • Configuration directory for special custom configurations is /etc/condor/config.d, e.g. mathematica
  • Local customizations are in /etc/condor/condor_config.local
  • Temporary per-note customizations can be made in /etc/condor/condor_config.override

Notes on the setup:
  • condor host is uc3-mgt.mwt2.org
  • domain is osg-gk.mwt2.org to allow MWT2 jobs to run using the same accounts as on MWT2
  • SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE should not be needed but not hurt either
  • Local schedd (only for test) DAEMON_LIST   = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
  • Authentications:
    ALLOW_WRITE = $(ALLOW_WRITE), *.mwt2.org, *.uchicago.edu, uct2-*.uchicago.edu, iut2-*.iu.edu
    INTERNAL_IPS = 10.1.3.* 10.1.4.* 10.1.5 128.135.158.241 uct2-6509.uchicago.edu
  • Flocking: FLOCK_FROM = uc3-cloud.uchicago.edu, uc3-cloud.mwt2.org, uc3-sub.uchicago.edu, uc3-sub.mwt2.org, ui-cr.uchicago.edu, ui-cr.mwt2.org, millikan.uchicago.edu, osg-gk.mwt2.org, condor.mwt2.org
  • Condor view: CONDOR_VIEW_HOST = uc3-cloud.uchicago.edu:39618

uc3-c0[20] computes

  • These nodes were built from a uct2 worker node image. First need to remove condor. Start on uc3-c001.
    [root@uc3-c001 condor]# rpm -qa | grep condor
    condor-7.6.4-1.x86_64
    [root@uc3-c001 condor]# rpm -e condor-7.6.4-1.x86_64
    Shutting down Condor (fast-shutdown mode)...  done.
    warning: /etc/condor/condor_config.local saved as /etc/condor/condor_config.local.rpmsave
    warning: /etc/condor/condor_config saved as /etc/condor/condor_config.rpmsave
    [root@uc3-c001 condor]# 
  • Download the repo package
  • Then we have
    [root@uc3-c001 yum.repos.d]# ls
    cobbler-config.repo  condor-development-rhel5.repo  dell-omsa-repository.repo  vdt.repo
  • Then yum install condor.x86_64, InstallLogCondorCompute

compute node configurations

Files:
  • /etc/condor/condor_config is the one configured for the MWT2
  • Configuration directory for special custom configurations is /etc/condor/config.d, e.g. mathematica
  • Local customizations are in /etc/condor/condor_config.local
  • Temporary per-note customizations can be made in /etc/condor/condor_config.override

Notes on the setup:
  • condor host is uc3-mgt.mwt2.org
  • domain is osg-gk.mwt2.org to allow MWT2 jobs to run using the same accounts as on MWT2
  • all flocking jobs from different domains use the uc3 account
  • nodecheck is executed to set the node online/offline, pandaid has been removed
  • authentication forwarding from the Negotiator is enabled

The configuration of the mathematica nodes should be split and put in the directory

Optional configurations:

How to make Cycle seeder Condor pool visible outside the 10.x network

The headnode is listening already on all network interfaces but there may be problems in the IP sent when it is initiating the communication and in reaching the worker nodes. The necessary steps are:
  1. Configure the primary IP as the public one:
    # so that machines outside can contact it
    # These commented lines should be the default
    #COLLECTOR.BIND_ALL_INTERFACES = TRUE
    #NEGOTIATOR.BIND_ALL_INTERFACES = TRUE
    #BIND_ALL_INTERFACES = TRUE
    NETWORK_INTERFACE = 129.93.229.141
    
  2. Configure CONDOR_HOST=uc3-mgt.uchciago.edu on the worker nodes.
To allow the Schedd to contact the worker nodes CCB should setup because the worker nodes are not reachable from outside the network (but they have outbound connectivity)
  1. On all the worker nodes (supposing that the Collector is also CCB):
    CCB_ADDRESS = $(COLLECTOR_HOST)
  2. Give the condor_collector acting as the CCB server a large enough limit of file descriptors, e.g. twice 500*(1+1+8):
    COLLECTOR.MAX_FILE_DESCRIPTORS = 10000
Optionally you can configure the worker nodes and the head node as being part of the same private 10.x network so that their communication goes all inside without crossing the router:
  1. Set the private network on the Collector/Negotiator:
    PRIVATE_NETWORK_INTERFACE=10.1.3.95
    PRIVATE_NETWORK_NAME=uc3-mgt.mwt2.org
  2. And the network name on the worker nodes:
    PRIVATE_NETWORK_NAME=uc3-mgt.mwt2.org
The worker nodes (and the uc3-sub submit host) are already configured to trust authentications forwarded by the Negotiator:
# To allow transitive authentication
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = True
ALLOW_DAEMON = submit-side@matchsession
Local references:

References


-- RobGardner - 03 Mar 2012
Topic revision: r3 - 31 May 2012, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback