Tier3 Cluster Flocking into ATLAS Connect 
  Overview 
There are only two steps needed to allow your cluster to flock jobs into 
ATLAS Connect.  Jobs are routed by the Remote Cluster Connect Factory (RCCF) which handles HTCondor submissions into various connected clusters participating the ATLAS Connect. 
 
-  MWT2 Administrators must enable access for your Local SCHEDD node.
  -  You must  enable GSI security and add the MWT2 Remote Cluster Connect Server name to the your Local SCHEDD FLOCK_TO HTCondor variable.
 
 
Note:
It is expected that your Local Site Administrator act as a registration agent on behalf of this institution's users and assumes responsibility for the actions of this user community. 
   Ask MWT2 Administrators to enable access 
Send an email to the MWT2 Administrators, 
support@mwt2.org, with the following information
 
-  Full name for the organization such as "University of Illinois at Urbana-Champaign", "University of Chicago", "Argonne National Lab" or "Duke University"
  -  Your sites nickname which should be taken from the institute's domain name: "uiuc", "uchicago", "duke", "anl"
  -  Your sites Administrator/Contact name(s) and email address(es)
  -  Your Local Site SCHEDD host full qualified domain name (FQDN)
  -  Your Local Site SCHEDD host Distinguished Name (DN)
 
 
The MWT2 Administrators will respond with a Remote Cluster Connect (RCC) Factory Port used by the RCC Factory on the RCC Factory Server. 
This port number is needed when setting up the RCC Flocking.   
  Certificate Authority and Host Certificate are required 
Flocking to a RCC Factory Server requires GSI security to be used by the Local Site HTCondor installation. 
GSI requires that your Local Site SCHEDD host have a functioning Certificate Authority (CA) (/etc/grid-security/certificates).
This SCHEDD host must also have a valid host certificate (/etc/grid-security/host[cert,key].pem) which provides the Distinguished Name (DN) of the host.
If the SCHEDD host does not have a functional CA, directions on how to install a CA are located at 
Installing Certificate Authorities Certificates and related RPMs
If the SCHEDD host does not have a host certificate, one can be requested in two ways
  
  Additions to your Local Site HTCondor 
The following lines need to be added to 
/etc/condor/condor_config.local or added as a drop-in module at 
/etc/condor/config.d on the Local Site SCHEDD host.
After adding these lines you need to issue a 
condor_reconfig command for the changes to take effect.
The following is an example of a drop-in module
# cat /etc/condor/config.d/rcc-flock.conf
# Setup the FLOCK_TO the RCC Factory
FLOCK_TO                                 = $(FLOCK_TO), uct2-bosco.uchicago.edu:<RCC_Factory_Port>?sock=collector
# Allow the RCC Factory server access to our SCHEDD
ALLOW_NEGOTIATOR_SCHEDD                  = $(CONDOR_HOST), uct2-bosco.uchicago.edu
# Who do you trust?
GSI_DAEMON_NAME                          = $(GSI_DAEMON_NAME), /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=uct2-bosco.uchicago.edu
GSI_DAEMON_CERT                          = /etc/grid-security/hostcert.pem
GSI_DAEMON_KEY                           = /etc/grid-security/hostkey.pem
GSI_DAEMON_TRUSTED_CA_DIR                = /etc/grid-security/certificates
# Enable authentication from tne Negotiator (This is required to run on glidein jobs)
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
RCC_Factory_Port should be replaced with the port number assigned by the MWT2 Administrators. 
The above GSI_ and SEC_ values are known to work with MWT2. It is possible that your Local Site HTCondor has other requirements which may require different values.
  Firewalls 
By default, the MWT2 Administrators will assign a RCC Factory Port number in a range starting at 11010 and incrementally increasing 
as sites are brought online and assigned numbers.
If your site is protected by a firewall, a port from this range may not work. If another port is more appropriate for use at your site, 
it is possible to use an alternative. This port number cannot previously be in use by another RCC Factory currently allowing flocking into RCC Factory Server. 
For example, if your site currently has the range of ports 9000-9700 open in the firewall, 
it is possible for a port in this range to be used as the RCC Factory Port.
If there are no existing holes in the firewall, it will be the responsibility of your Local Site Administrator to request a hole be opened in the firewall from your Local Network Administrators. This hole must be for node "uct2-bosco.uchicago.edu" with at minimum a single port MWT2 approves for use in the flocking.
If your Local Site HTCondor configuration uses the SHARED_PORT Daemon, the SHARED_PORT and RCC Factory Port may be the same value. 
  Setup a Local Site SCHEDD server for Remote User Computing 
If a site does not have a working HTCondor installation, the following procedure can be used to easily setup a SCHEDD 
only node enabled for RCC Flocking.
The following requirements must first be meet
 
-  An operational Linux node running EL5 or EL6
  -  The node must be on a public network - it cannot be on a private network behind a NAT router.
  -  Proper forward/reverse DNS registration of the public network IP.
  -  The host must have a CA
  -  The host must have a host certificate
  -  Root access to the node
  -  Any local firewall must have at least one port completely open in both directions (at a minimum open to all MWT2 subnets)
 
 
An attached script, 
RCCcondor.sh, is provided which will install a RCC HTCondor SCHEDD on this node.
Do not use this script blindly. Read it carefully and make certain the script will not make undesired changes to your Linux node.
It would be best to use a cleanly installed Linux node which can be rebuilt easily should the end result not be what was expected.
At least one change should be made to this script prior to execution. 
The variable 
RCC_Factory_Port must be changed to the value assigned to the Local Site by MWT2 Administrators.
This installation of HTCondor uses the 
SHARED_PORT feature. This allows HTCondor to use a single port in its communication with remote services.
This port must be open to all nodes which will need to contact this HTCondor installation. It is safe and easiest to completely open this port to the world.
If a more restrictive setting must be used, the port, at a minimum, must be fully open to all MWT2 subnets. The subnet values can be provided upon request.
The variable 
RCC_Shared_Port controls which port is used by 
SHARED_PORT. The default value is the value of 
RCC_Factory_Port but can be changed based on local site requirements. The default setting requires that only a single port needs to be open within the firewall.
This script will perform numerous actions on your behalf
 
-  Removes any currently installed HTCondor, logs, configuration files, etc
  -  Downloads and installs the HTCondor yum repository
  -  Downloads and installs the current stable release of HTCondor
  -  Disables libvirtd services (only needed if you are a hypervisor)
  -  Installs a local condor configuration file which enables RCC access
  -  Increases the default system limits for maximum file descriptors, memory, etc
 
 
Once this script has completed successfully, HTCondor can be started with
/etc/init.d/condor start
All HTCondor log files will be stored in the standard location
/var/log/condor
Inspect these logs for any errors.
Once HTCondor has started successfully, you should be able to issue any standard HTCondor command. 
Test the installation by submitting a test  as described in 
Submit Test Jobs.
Download and submit the 
hostname.cmd
condor_submit hostname.cmd
You can then check on the status of the job with
condor_q
Once the job has executed, you can check the output in the log file.
  Test Job Example 
The following is an example of how to run a test job, check on its status and display any output
  Submitting a test job 
[ddl@lx0 hostname]$ condor_submit hostname.cmd
Submitting job(s).
1 job(s) submitted to cluster 20870.
  The job is queued but in the "Idle" state (I) 
[ddl@lx0 hostname]$ condor_q
-- Submitter: lx0.hep.uiuc.edu : <128.174.118.140:42710> : lx0.hep.uiuc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
20870.0   ddl            12/4  15:15   0+00:00:00 I  0   0.0  hostname          
1 jobs; 1 idle, 0 running, 0 held
  The job has completed and is no longer in the queue 
[ddl@lx0 hostname]$ condor_q
-- Submitter: lx0.hep.uiuc.edu : <128.174.118.140:42710> : lx0.hep.uiuc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
0 jobs; 0 idle, 0 running, 0 held
  The contents of the output log 
[ddl@lx0 hostname]$ cat hostname.out
################################################################################
#####                                                                      #####
#####        Job is running within a Remote Cluster Connect Factory        #####
#####                                                                      #####
##### Date: Wed Dec 4 15:16:29 CST 2013                                    #####
##### User: ruc.uiuc                                                       #####
##### Host: uct2-c275.mwt2.org                                             #####
#####                                                                      #####
################################################################################
uct2-c275.mwt2.org
  Installation script RCCcondor.sh 
The following is an explanation of the workings of the script 
RCCcondor.sh.
  Variables 
There are three variables near the top of the script
 
-  RCC_Factory_Port=
  -  RCC_Shared_Port=${RCC_Factory_Port}
  -  RCC_Factory_Server="uct2-bosco.uchicago.edu"
  -  RCC_Factory_DN="/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=uct2-bosco.uchicago.edu"
 
 
  RCC_Factory_Port 
The Port used to designate a specific RCC Factory on the RCC Factory Server.
This port is assigned by the RCC Factory Server Administrator.
If the local site has firewall restrictions in place, a mutually agreed upon port number can be used.
  RCC_Shared_Port 
The Shared Port used by the local HTCondor SHARED_PORT daemon which by default will be the same as the RCC Factory Port.
This port must be open in any local firewalls to the node specified by ${RCC_Factory_Server}
This port number might need to assigned by the local network administrator.
If there are no firewalls between this node and the RCC_Factory Server, the default value will work.
  RCC_Factory_Server 
This is the RCC Factory Server. You should not need to change this value.
For MWT2, it is "uct2-bosco.uchicago.edu". 
  RCC_Factory_DN 
This is the DN of the RCC Factory Server. You should not need to change this value.
For MWT2, it is "/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=uct2-bosco.uchicago.edu"
  Remove HTCondor 
To avoid any confusion, if a current version of HTCondor is installed, it is removed and all support files are deleted
# Stop any running condor
/etc/init.d/condor stop
# Remove it completely
yum -y remove condor
rm -rf /var/lib/condor
rm -rf /var/log/condor
rm -rf /etc/condor
rm -rf /etc/sysconfig/condor
rm -rf /etc/yum.repos.d/htcondor-stable-rh${myEL}.repo
  Install HTCondor 
The current HTCondor yum repositories are downloaded and installed.
The current stable release of HTCondor is then downloaded and installed.
The libvirtd services are turned off as they are not needed.
# Fetch the latest repository from HTCondor
(cd /etc/yum.repos.d; wget http://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rh${myEL}.repo)
# Reset the yum repo cache
yum clean all
# Install the new condor 
yum -y install condor.${myArch}
# Condor enables these but we want them off
chkconfig libvirtd off
chkconfig libvirt-guests off
  HTCondor configuration file 
A copy of 
/etc/condor/condor_config.local is created to setup HTCondor to participate in the RCC
rm -rf /etc/condor/condor_config.local
cat <<EOF>>/etc/condor/condor_config.local
# Condor configuration to allow a node to participate as a SCHEDD for a RCC
.
.
.
EOF
Several key components of this local configuration are described here
  DAEMON_LIST 
DAEMON_LIST                              = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT
This SCHEDD daemon is needed to "submit" jobs. 
To FLOCK jobs to the RCCFS, the  NEGOTIATOR and COLLECTOR daemons are required.
The SHARED_PORT  daemon is needed so that only one port is used in the FLOCKing commnication.
  Shared Port  
USE_SHARED_PORT                          = TRUE
SHARED_PORT_ARGS                         = -p ${RCC_Shared_Port}
We enable the SHARED_PORT daemon and indicate which port to use.
  COLLECTOR_HOST 
COLLECTOR_HOST                           = \$(CONDOR_HOST):${RCC_Shared_Port}?sock=collector
The COLLECTOR must be told to also use the SHARED_PORT
  FLOCK_TO 
FLOCK_TO                                 = \$(FLOCK_TO), ${RCC_Factory_Server}:${RCC_Factory_Port}?sock=collector
Enabled FLOCKing to the RCC Factory Server on the assigned port
  Security changes 
ALLOW_NEGOTIATOR_SCHEDD                  = \$(CONDOR_HOST), ${RCC_Factory_Server}
GSI_DAEMON_NAME                          = \$(GSI_DAEMON_NAME), ${RCC_Factory_DN}
GSI_DAEMON_CERT                          = /etc/grid-security/hostcert.pem
GSI_DAEMON_KEY                           = /etc/grid-security/hostkey.pem
GSI_DAEMON_TRUSTED_CA_DIR                = /etc/grid-security/certificates
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
These changes allow the RCC Factory Server access to the local Negotiator
  MAX_FILE_DESCRIPTORS 
MAX_FILE_DESCRIPTORS                     = 20000
When using the SHARED_PORT daemon, all connection are via TCP.
Each job submitted uses 3 or more files descriptors to create the appropriate connections.
The value of 20000 will allow this HTCondor to handle over 5000 submitted jobs.
  Increase system wide limits 
To handle the large demands which can be placed on HTCondor, 
various system wide limits must be increased. The script places a file into the limits.d
to modify these values. The important value is 
nofile which must be larger than
the value given by 
MAX_FILE_DESCRIPTORS
rm -rf /etc/security/limits.d/rcc.conf
cat <<EOF>>/etc/security/limits.d/rcc.conf
# Remove all the limits so we avoid trouble
* - nofile  1000000
* - nproc   unlimited
* - memlock unlimited
* - locks   unlimited
* - core    unlimited
EOF