Minutes Jan 25
Allocation of the new nodes
Tasks for storage node preparation:
- commission 8024F (10GB switch)- Aaron
- test the 10 GB throughput
Storage node test: 2 benchmarks for MD1200
- disk read and write speed
- make sure that the 10 GB NIC works correctly
Explore all the IO possibility for KVM, i.e. if VM can see disks other than the one KVM is installed to
- Nate: only the disk where it is installed is available; does not know how to mount other disks
- Suchandra: up to 4 disks can be mounted on VMs
All host systems (KVM and ESXi hosts) will have private IPs only. To access all the host nodes admins will bounce from itb1 or itb4.
Everyone should add to the table in NodesAllocation
links to twiki instructions documenting the components installed on that node.
Specify in the nodes column any service that will be running on that node.
Names of the real and virtual machines have been reviewed:
itb-kvmX - KVM servers
itb-esxX - VMware ESXi hypervisors
itb-SVC naming convention used for itb VMs. With SVC short name of the service used for the itb VMs
vtb-ce (instead of
- Are there better names for the virtual tier 3 (currently
- Same for the submit hosts (
Add one machine to act as web cache,
- would be interesting to compare squid in a VM to one in a bare machine (VM host)
- Nate recommends to install squid in a 200 GB profile (big host profile)
Rob suggested to explore how CVMFS works. E.g. compare the following configuratons (performance, load, traffic):
- worker nodes pointing to CERN
- point all nodes to a local Squid cache
- Mounting CVMFS on one node and exporting via NFS
In the table add a memory/disk column for all the VM lines. Fill it in with the requirements estimated depending on the services running on that VM.
Currently there are 2 profiles possible in cobbler:
- small host 10 GB, probably used by most
- big host 200 GB, for squid cache and little more
is running a lot of services. It may be good to move some of them on a different node. Suchandra would like to move the virtual Tier 3 VM on
after it has been used for some tests. This sonds OK.
itb1 to itb4 (4 1950 machines) are currently allocated to ITB and future allocation is not clear. For now they will remain allocated to ITB.
Other nodes will be deprecated instead. Deprecate all ITB activity on the tier 3 nodes: edge5 () /edge6 (dCache) /edge7 (headnode)
Changing the name of the CE and SE may be complex. Look in tiers-of atlas and panda config. Also in panda monitor.
Marco will check what is needed to change a gatekeeper name.
- exercise to change the name of a gatekeeper: what is required
- =uc-itb-edge7 -> uc-itb pointing to a machine different from edge7
- could be used one alias
itb-ce that could point to
uc-itb-condor (even if this would require some reconfiguration in panda). For now the alias will point to the PBS gatekeeper (same as the existing one).
- possible migration path
- create alias
- investigate what is involved
- prepare a certificate that works with the new name
- migrate to the new name
- once this is done the transfer to the VM should be easy
Other possibilities include migrating the machine to a VM keeping the old name (e.g.
for the CE).
High Throughput Parallel Computing (scheduling jobs using multiple slots on the same node, also called node affinity) is of high interest for OSG:
- find instructions for Condor
- instructions for PBS
- what can provide for OSG?
- VM shipping and Amazon EC2 instantiation
- New machine room in UIUC will have one system admin and we are one of the customer (one of the guests using cycles). Would this help us? E.g. Spin a VM with SL 5.5 and use it for ATLAS
- Have a technical meeting with Kate where she is explaining what is possible and effort estimates
Campus grid incubator for UC could be an interesting activity:
- connect different resources on campus:
- start with ITB workernodes
- access Tier 2
- plug in authentication (CNET ID)
How are handled in these new nodes ATLAS accounts, ATLAS releases, OSG mounts?
- itb1 - will remain the home server and NFS server
- releases will be mounted only via CVMFS
- OSG mounts will be mounted from the Tier 2
Nate and Suchandra: Disk partitioning and LVM. Nate proposal has been accepted.
- 2 disks in RAID 1 (mirror) for system disk: 500 GB
- 4 disks in RAID 5 with LVM for VM space: 1.5 GB
- This configuration is more robust and there is less work in case of failure of system disks. There is plenty of disk space. If more disk space is needed this may change
Nate, Suchandra, Marco: Talk of routing scheme. There is no need for a router. Host machines and existing network will provide for the current needs
Suchandra will work on a deprecation plan for Grid Colombia machines.
Follow up from Suchandra after the meeting:
Following up on the meeting, there were a few points:
Partitioning on R610s
2 drives in RAID 1 for system
4 drives in RAID 5 for storage and kvm files
lvm vs. file i/o vs physical
This will be the first test to get a baseline and understanding of trade-offs and performance before using a given method on
virtual networks / routing
KVM should support this but the vmware machines can be used as a fallback option if this is too difficult to setup or
Tier 3 deprecation -
Initially unchanged until KVMs are setup properly
Stage 1 - Ancillary services
move dcache pool from edge5 to storage system
edge6 can be repurposed at this point
migrate vtb cache / pages and svn repository to VM (itb-web)
change osg-vtb and vtb-svn cnames to point to itb-web
edge2 can be repurposed at this point
Stage 2 - UC_ITB
get ITB services on new machines running correctly and without issues (cvmfs/user accounts included)
change panda endpoints for official and itb robot pilots
change OIM registration
edge7 can be repurposed at this point
An interesting point came up during the discussion of transitioning to a new CE. The grid certs for the CE have a host name
associated with it and the name needs to match the hostname that the submitter thinks it is going to so we won't be able to
play around with aliases in the transition to hide the machine.
I'll ask Steve about the fermilab setup to see about multiple gatekeepers submitting to various batch systems or a gatekeeper
submitting to multiple batch systems.
- 25 Jan 2011