Minutes Jan 25

Allocation of the new nodes

Tasks for storage node preparation:
  • commission 8024F (10GB switch)- Aaron
  • test the 10 GB throughput

Storage node test: 2 benchmarks for MD1200
  • disk read and write speed
  • make sure that the 10 GB NIC works correctly

Explore all the IO possibility for KVM, i.e. if VM can see disks other than the one KVM is installed to
  • Nate: only the disk where it is installed is available; does not know how to mount other disks
  • Suchandra: up to 4 disks can be mounted on VMs

All host systems (KVM and ESXi hosts) will have private IPs only. To access all the host nodes admins will bounce from itb1 or itb4.

Everyone should add to the table in NodesAllocation links to twiki instructions documenting the components installed on that node. Specify in the nodes column any service that will be running on that node.

Names of the real and virtual machines have been reviewed:
  • itb-kvmX - KVM servers
  • itb-esxX - VMware ESXi hypervisors
  • itb-SVC naming convention used for itb VMs. With SVC short name of the service used for the itb VMs
  • vtb-ce (instead of itb-vce)
  • Are there better names for the virtual tier 3 (currently gc2-AAA)?
  • Same for the submit hosts (ui-AAA)

Add one machine to act as web cache, itb-squid
  • would be interesting to compare squid in a VM to one in a bare machine (VM host)
  • Nate recommends to install squid in a 200 GB profile (big host profile)

Rob suggested to explore how CVMFS works. E.g. compare the following configuratons (performance, load, traffic):
  • worker nodes pointing to CERN
  • point all nodes to a local Squid cache
  • Mounting CVMFS on one node and exporting via NFS

In the table add a memory/disk column for all the VM lines. Fill it in with the requirements estimated depending on the services running on that VM.

Currently there are 2 profiles possible in cobbler:
  • small host 10 GB, probably used by most
  • big host 200 GB, for squid cache and little more

itb-kvm3 is running a lot of services. It may be good to move some of them on a different node. Suchandra would like to move the virtual Tier 3 VM on itb-kvm4 after it has been used for some tests. This sonds OK.

itb1 to itb4 (4 1950 machines) are currently allocated to ITB and future allocation is not clear. For now they will remain allocated to ITB.

Other nodes will be deprecated instead. Deprecate all ITB activity on the tier 3 nodes: edge5 () /edge6 (dCache) /edge7 (headnode)

Changing the name of the CE and SE may be complex. Look in tiers-of atlas and panda config. Also in panda monitor.

Marco will check what is needed to change a gatekeeper name.
  • exercise to change the name of a gatekeeper: what is required
  • =uc-itb-edge7 -> uc-itb pointing to a machine different from edge7
  • could be used one alias itb-ce that could point to uc-itb-pbs or uc-itb-condor (even if this would require some reconfiguration in panda). For now the alias will point to the PBS gatekeeper (same as the existing one).
  • possible migration path
    • create alias
    • investigate what is involved
    • prepare a certificate that works with the new name
    • migrate to the new name
    • once this is done the transfer to the VM should be easy

Other possibilities include migrating the machine to a VM keeping the old name (e.g. uct3-edge7 for the CE).

High Throughput Parallel Computing (scheduling jobs using multiple slots on the same node, also called node affinity) is of high interest for OSG:
  • find instructions for Condor
  • instructions for PBS

Nimbus plan:
  • what can provide for OSG?
  • VM shipping and Amazon EC2 instantiation
  • New machine room in UIUC will have one system admin and we are one of the customer (one of the guests using cycles). Would this help us? E.g. Spin a VM with SL 5.5 and use it for ATLAS
  • Have a technical meeting with Kate where she is explaining what is possible and effort estimates

Campus grid incubator for UC could be an interesting activity:
  • connect different resources on campus:
    • start with ITB workernodes
    • access Tier 2
  • plug in authentication (CNET ID)

How are handled in these new nodes ATLAS accounts, ATLAS releases, OSG mounts?
  • itb1 - will remain the home server and NFS server
  • releases will be mounted only via CVMFS
  • OSG mounts will be mounted from the Tier 2

Nate and Suchandra: Disk partitioning and LVM. Nate proposal has been accepted.
  • 2 disks in RAID 1 (mirror) for system disk: 500 GB
  • 4 disks in RAID 5 with LVM for VM space: 1.5 GB
  • This configuration is more robust and there is less work in case of failure of system disks. There is plenty of disk space. If more disk space is needed this may change

Nate, Suchandra, Marco: Talk of routing scheme. There is no need for a router. Host machines and existing network will provide for the current needs

Suchandra will work on a deprecation plan for Grid Colombia machines.

Follow up from Suchandra after the meeting:
Following up on the meeting, there were a few points:                                                                             
Partitioning on R610s                                                                                                             
2 drives in RAID 1 for system                                                                                                     
4 drives in RAID 5 for storage and kvm files                                                                                      
lvm vs. file i/o vs physical                                                                                                      
This will be the first test to get a baseline and understanding of trade-offs and performance before using a given method on      
"production" setup                                                                                                                
virtual networks / routing                                                                                                        
KVM should support this but the vmware machines can be used as a fallback option if this is too difficult to setup or             
administer                                                                                                                        
Tier 3 deprecation -                                                                                                              
Initially unchanged until KVMs are setup properly                                                                                 
Stage 1 - Ancillary services                                                                                                      
move dcache pool from edge5 to storage system                                                                                     
edge6 can be repurposed at this point                                                                                             
migrate vtb cache / pages  and svn repository to VM (itb-web)                                                                     
change osg-vtb and vtb-svn cnames to point to itb-web                                                                             
edge2 can be repurposed at this point                                                                                             
Stage 2 - UC_ITB                                                                                                                  
get ITB services on new machines running correctly and without issues (cvmfs/user accounts included)                              
change panda endpoints for official and itb robot pilots                                                                          
change OIM registration                                                                                                           
edge7 can be repurposed at this point                                                                                             
                                                                                                                                  
An interesting point came up during the discussion of transitioning to a new CE.  The grid certs for the CE have a host name      
associated with it and the name needs to match the hostname that the submitter thinks it is going to so we won't be able to       
play around with aliases in the transition to hide the machine.                                                                   
                                                                                                                                  
I'll ask Steve about the fermilab setup to see about multiple gatekeepers submitting to various batch systems or a gatekeeper     
submitting to multiple batch systems.                                                                                             

-- MarcoMambelli - 25 Jan 2011
Topic revision: r1 - 25 Jan 2011, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback