SiteReportMay19

lsm-get failures

  • lsm-put issues continue to dominate failure rates
  • Charles has been doing diagnostics based on logfile data
  • #movers per pool: 500.
  • lsm supposed to handle its own timeouts
  • Pilot time-out to be increased from 30 minutes to 90 minutes

Kernel update testing

  • This is to address the soft-locking kernel error we've seen.
  • Built latest kernel 2.6.33-4 for one of MWT2_UC's worker nodes
  • uct2-c201-209 ready to install - will be running jobs
  • Considered also SL 5.5 kernel via rpm (2.6.18-xx)
  • Charles studied optimizations during build - lots of interesting findings, to feedback to SL.

Squid

  • Squid now updated at IU and UC

ANALY_MWT2_X

  • Testing local site mover prototype from Charles using this analysis queue
  • Checksum for lsm-puts handled by reading back from worker node; no dependence on Bestman

Bringing more storage online

  • Using 707TB of 915TB (77%)
  • Current space report: http://www.mwt2.org/sys/space
  • Expect to bring another server (with 6 MD1000 shelves, 2 TB drives, 180 TB raw) within the week

Space token graphing in Cacti

  • graph_image.php.png:
    graph_image.php.png

Network and jobs activity (UC)

  • graph_image-1.php.png:
    graph_image-1.php.png

  • graph_image-2.php.png:
    graph_image-2.php.png


-- RobGardner - 18 May 2010
Topic revision: r2 - 19 May 2010, RobGardner
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback