Condor Notes

Questions

Is condor using UID or user names?
Let's take the case where the mapping is different on different systems.
From the documentation it seems the UID is the one matched.
- how happens with LDAP?
- cross OS?

What means "end the same" in: "Condor compares the submit machine's claimed value for UID_DOMAIN to its fully qualified name. If the two do not end the same, then the submit machine is presumed to be lying about its UID_DOMAIN."?
- one a substring of the other?
- all after the first dot?
- how do I manage several submit hosts in the same pool with same UIDs? do I need TRUST_UID_DOMAIN?

http://research.cs.wisc.edu/condor/manual/v7.7/3_6Security.html
http://research.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#17150

Id I change uid_domain, reconfigure and restart the machines, still there is the old domain.
It changed much later


How to

Get debug output

Debug options form the condor manual: http://research.cs.wisc.edu/condor/manual/v7.9/3_3Configuration.html#param:SubsysDebug

Components:
  • gridmanager: involved in blahp jobs
  • negotiator: preforms the matching
GRIDMANAGER_DEBUG = D_FULLDEBUG

Others that I've seem used:
LEASEMANAGER_DEBUG = D_FULLDEBUG
GRIDMANAGER_DEBUG =3D D_FULLDEBUG

Determine if pool password stored on each machine:

condor_status -f "%s\t" Name -f "%s\n" ifThenElse(isUndefined(LocalCredd),\"UNDEF\",LocalCredd) 
If any machine has UNDEF associated with it then the pool password was not stored correctly.

Things to try:

Play with universes, Condor-C

Documentation of the different universes, what is needed, different parameters, tips: http://research.cs.wisc.edu/condor/manual/v7.7/5_3Grid_Universe.html

Try Condor to Condor submission (benchmark, other tests):
  • flocking
  • condor-c
  • or bosco (grid - condor)

SOAP interface

References:

Configuration

Condor configuration: macros and domains, predefined variables http://research.cs.wisc.edu/condor/manual/v7.9/3_3Configuration.html

When should I use _ or .

Order of configuration files

Variable definition are read in the following order:
  • CONDOR_CONFIG file is read first (/etc/condor/condor_config)
  • Then the variables LOCAL_CONFIG_DIR and LOCAL_CONFIG_FILE are read * The directory and all the included files are checked first. Can it be a directory list? * The file, value at the end of the directory files, is checked then. It can be a file list.
  • If the file contains a new definition of the directory or the file (list) itself, they are checked again and parsed after parsing the file.
  • NOTE that changing the value of a variable used in the definition of another variable (e.g. changing ETC when LOCAL_CONFIG_DIR=$(ETC)/config.d) is enough to change the value of the variable and have it rescanned after

Configuration can be obtained also by a program by referring it in CONDOR_CONFIG or LOCAL_CONFIG_FILE and ending with a "|", e.g.
LOCAL_CONFIG_FILE = /bin/make_the_config|

After the files are evaluated, the environment is scanned for variables prefixed with CONDOR (or condor). The tools strip off the prefix, and utilize what remains as configuration. As the use of environment variables is the last within the ordered evaluation, the environment variable definition is used. The security of the system is not compromised, as only specific variables are considered for definition in this manner, not any environment variables with the CONDOR prefix.

Order of variable lookup

The ordering used to look up a variable, called <parameter name>:
  1. <subsystem name>.<local name>.&ltparameter name>
  2. <local name>.<parameter name>
  3. <subsystem name>.&ltparameter name>
  4. <parameter name>
If this local name is not specified on the command line, numbers 1 and 2 are skipped. As soon as the first match is found, the search is completed, and the corresponding value is used.

Logging settings

Settings in <SUBSYS>_DEBUG, ALL_DEBUG (all subsystems), TOOL_DEBUG (execution of tools/commands logged to stderr):
  • D_ALL - all messages, huge
  • D_FULLDEBUG - generic debug flag
  • Specific flags - D_DAEMONCORE, D_PRIV, D_COMMAND, D_LOAD, D_KEYBOARD, D_JOB, D_MACHINE, D_SYSCALLS, D_MATCH, D_NETWORK, D_HOSTNAME, D_CKPT, D_SECURITY, D_PROCFAMILY, D_ACCOUNTANT, D_PROTOCOL
  • Formatting flags - D_PID, D_FDS, D_CATEGORY

Some interesting settings

  • USER_JOB_WRAPPER

Multiple schedd

http://www.cyclecomputing.com/wiki/index.php?title=Running_Multiple_Condor_Schedds

See also: http://research.cs.wisc.edu/htcondor/manual/v7.9/3_3Configuration.html#SECTION00431200000000000000

Shared secret to forward authentication: USE_MATCH_AUTH / SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION

SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = True
This condor atrribute has to be set for match authentication to workfor SCHEDD daemons, therefore WMS Collector and Submit services. This is a special authentication mechanism designed to minimize overhead in the condor_schedd when communicating with the execute machine. Essentially, matchmaking results in a secret being shared between the condor_schedd and condor_startd, and this is used to establish a strong security session between the execute and submit daemons without going through the usual security negotiation protocol. This is especially important when operating at large scale over high latency networks (e.g. a glidein pool with one schedd and thousands of startds on a network with 0.1 second round trip times). The default value for this configuration option is False. To have any effect, it must be True in the configuration of both the execute side (startd) as well as the submit side (schedd). When this authentication method is used, all other security negotiation between the submit and execute daemons is bypassed. All inter-daemon communication between the submit and execute side will use the startd's settings for SEC_DAEMON_ENCRYPTION and SEC_DAEMON_INTEGRITY; the configuration of these values in the schedd, shadow, and starter are ignored.

Important: For strong security, at least one of the two, integrity or encryption, should be enabled in the startd configuration. Also, some form of strong mutual authentication (e.g. GSI) should be enabled between all daemons and the central manager or the shared secret which is exchanged in matchmaking cannot be safely encrypted when transmitted over the network.

The schedd and shadow will be authenticated as submit-side@matchsession when they talk to the startd and starter. The startd and starter will be authenticated as execute-side@matchsession when they talk to the schedd and shadow. On the submit side, authorization of the execute side happens automatically. On the execute side, it is necessary to explicitly authorize the submit side. Example:
ALLOW_DAEMON = submit-side@matchsession/192.168.123.*

When SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION is true, execute-side@matchsession is automatically granted READ access to the condor_schedd and DAEMON access to the condor_shadow

http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecEnableMatchPasswordAuthentication

SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION This is a special authentication mechanism designed to minimize overhead in the condor_schedd when communicating with the execute machine. Essentially, matchmaking results in a secret being shared between the condor_schedd and condor_startd, and this is used to establish a strong security session between the execute and submit daemons without going through the usual security negotiation protocol. This is especially important when operating at large scale over high latency networks (for example, on a pool with one condor_schedd daemon and thousands of condor_startd daemons on a network with a 0.1 second round trip time). The default value for this configuration option is False. To have any effect, it must be True in the configuration of both the execute side (condor_startd) as well as the submit side (condor_schedd). When this authentication method is used, all other security negotiation between the submit and execute daemons is bypassed. All inter-daemon communication between the submit and execute side will use the condor_startd daemon's settings for SEC_DAEMON_ENCRYPTION and SEC_DAEMON_INTEGRITY; the configuration of these values in the condor_schedd, condor_shadow, and condor_starter are ignored.

Important: For strong security, at least one of the two, integrity or encryption, should be enabled in the startd configuration. Also, some form of strong mutual authentication (e.g. GSI) should be enabled between all daemons and the central manager or the shared secret which is exchanged in matchmaking cannot be safely encrypted when transmitted over the network.

The condor_schedd and condor_shadow will be authenticated as submit-side@matchsession when they talk to the condor_startd and condor_starter. The condor_startd and condor_starter will be authenticated as execute-side@matchsession when they talk to the condor_schedd and condor_shadow. On the submit side, authorization of the execute side happens automatically. On the execute side, it is necessary to explicitly authorize the submit side. Example:

ALLOW_DAEMON = submit-side@matchsession/192.168.123.* Replace the example netmask with something suitable for your situation.

Default Authentication Methods (as 7.8)

SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS, NTSSPI indicates that either Kerberos or Windows authentication may be used, but Kerberos is preferred over Windows. Note that if the client and daemon agree that multiple authentication methods may be used, then they are tried in turn. For instance, if they both agree that Kerberos or NTSSPI may be used, then Kerberos will be tried first, and if there is a failure for any reason, then NTSSPI will be tried. An additional specialized method of authentication exists for communication between the condor_schedd and condor_startd. It is especially useful when operating at large scale over high latency networks or in situations where it is inconvenient to set up one of the other methods of strong authentication between the submit and execute daemons. See the description of SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION on [*] for details.

If the configuration for a machine does not define any variable for SEC__AUTHENTICATION, then HTCondor uses a default value of OPTIONAL. Authentication will be required for any operation which modifies the job queue, such as condor_qedit and condor_rm. If the configuration for a machine does not define any variable for SEC__AUTHENTICATION_METHODS, the default value for a Unix machine is FS, KERBEROS, GSI. This default value for a Windows machine is NTSSPI, KERBEROS, GSI.

UID_DOMAIN

In the examples are mentioned names but the text states that user IDs are supposed to be the same. *Unless TRUST_UID_DOMAIN is true, Condor compares the uid_domain with the hostname of the submit host and looks for a match. Otherwise it is considered laying and no matching UID is used. So choosing the hostname of the schedd as UID_DOMAIN is a good thing to do to simplify the configuration.
  • Then unless you set SOFT_UID_DOMAIN, if the UID is not in the password file and is not matching the user name, the job fails to start.

References:

Some parameters

  • FLOCK_TO Hosts that can run jobs submitted on this pool. It would allow to daisy-chain. Used by schedd only. First attempts to submit local, then increases the flocking level.
  • FLOCK_FROM Hosts that will be allowed to run jobs on this cluster. Used by the startd (or starter ?) and the negotiator ?

  • UID_DOMAIN set to a unique name for this resource. If this is the same, condor jobs can be submitted to the nodes and they appear in the same queue. There is no connection to the host domain (from FQDN).
  • FILESYSTEM_DOMAIN if this string matches, then condor can copy the files if a path is the same (how paths are selected? home directory, which other?). This can be the same as UID_DOMAIN. There is no connection to the host domain (from FQDN).

CCB configuration or Firewall

From the scheduler or resource (startd) set the
  • CCB_ADDRESS = $(COLLECTOR_HOST) - Negotiator, in a public network
  • PRIVATE_NETWORK_NAME = cs.wisc.edu - Hosts in the same private network will not use CCB to communicate
Remember to set also the security connections.

Some security level notes:
  • ADVERTISE_STARTD New/updated ClassAds from startd. Defaults to DAEMON
  • DAEMON This access level is used for commands that are internal to the operation of Condor. An example of this internal operation is when the condor_startd daemon sends its ClassAd updates to the condor_collector daemon (which may be more specifically controlled by the ADVERTISE_STARTD access level). Authorization at this access level should only be given to the user account under which the Condor daemons run. The DAEMON level of access implies both READ and WRITE access. Any setting for this access level that is not defined will default to the corresponding setting in the WRITE access level.
  • WRITE The WRITE level of access implies READ access.
Host based authentication:

Links:

-- MarcoMambelli - 04 May 2011
Topic revision: r14 - 12 Jul 2013, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback