Test installation of LFC

NOTE

This document is no longer the most current guide to LFC installation. Please see:

http://www.usatlas.bnl.gov/twiki/bin/view/Admins/InstallLFConOSG.html

This page is being kept here as a record only.

First pass (Aug 26 2008)

Followed instructions here: https://twiki.cern.ch/twiki/bin/view/LCG/LfcAdminGuide and in email from Alain Roy.

Started off with clean SL install on tier2-04.uchicago.edu
[root@tier2-04]# cat /etc/issue
Scientific Linux SL release 5.0 (Boron)

Set up MySQL:
[root@tier2-04]# yum install mysql-server
[...] 
[root@tier2-04]# chkconfig --levels 2345 mysqld on

Get Pacman:
[root@tier2-04]# cd /opt
[root@tier2-04 opt]# wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz
[...]
Get LFC from Alain's Pacman cache:
[root@tier2-04 opt]# pacman -get http://vdt.cs.wisc.edu/test-cache/roy:LFC 
[...]  

LFC daemon wants to run as dedicated user lfcmgr. Do we need to make this account or does the VDT installer do this?

Looks like we have to do this step:

[root@tier2-04]#  useradd -c "LFC manager" -d /home/lfcmgr lfcmgr 

Note the following values are hardcoded, according to Alain:

NSCONFIGFILE=/opt/lcg/etc/NSCONFIG
LFCUSER=lfcmgr
LFCGROUP=lfcmgr

Initialize the MySQL databases:
[root@tier2-04]# INSTALL_DIR=/opt/lcg
[root@tier2-04]# mysql -u root -p < INSTALL_DIR/share/LFC/create_lfc_tables_mysql.sql
[root@tier2-04]# mysql -u root -p
use mysql
GRANT ALL PRIVILEGES ON cns_db.* TO 'lfc'@localhost IDENTIFIED BY 'lfc_password' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON cns_db.* TO 'lfc'@LFC_HOST IDENTIFIED BY 'lfc_password' WITH GRANT OPTION;

Create /opt/lcg/etc/NSCONFIG, containing
lfcmgr/lfc_password@tier2-04.uchicago.edu/cns_db

Now how do I start it?

[root@tier2-04 init.d]# vdt-control --list
Service            | Type   | Desired State
-------------------+--------+--------------
fetch-crl          | cron   | enable
vdt-rotate-logs    | cron   | enable
vdt-update-certs   | cron   | enable
mysql5             | init   | do not enable

Copy init scripts to /etc/init.d
[root@tier2-04 ~]# cp /opt/lcg/etc/init.d/lfc* /etc/init.d
[root@tier2-04 ~]# chkconfig --add lfcdaemon
[root@tier2-04 ~]# chkconfig --add lfc-dli
[root@tier2-04 init.d]# /etc/init.d/lfcdaemon start
Starting lfcdaemon: bash: /opt/lcg/bin/lfcdaemon: Permission denied
lfcdaemon not started:                                     [FAILED]

What's wrong? Running the init script with sh -x I find:

+ /bin/bash -c 'ulimit -S -c 0 >/dev/null 2>&1 ; su -s /bin/bash lfcmgr -c "/opt/lcg/bin/lfcdaemon   -t 20 -c /opt/lcg/etc/NSCONFIG -l /var/log/lfc/log"'
bash: /opt/lcg/bin/lfcdaemon: Permission denied
This is because many dirs under /opt/lfc are only readable by root, and the daemon wants to run as lfcmgr

[root@tier2-04 lcg]# ls -ld /opt/lcg/bin
drwx------ 2 root root 4096 Aug 26 19:17 bin

Correct all perms:
[root@tier2-04 opt]# find . -type d  -exec chmod go+rx {} \;

Now, it starts!
[root@tier2-04 lcg]# /etc/init.d/lfcdaemon start
Starting lfcdaemon:                                        [  OK  ]
[root@tier2-04 lcg]# 
[root@tier2-04 lcg]# /etc/init.d/lfc-dli start
Starting lfc-dli:                                          [  OK  ]
[root@tier2-04 lcg]# 

Next test: try lfc-ls

[root@tier2-04 lcg]# lfc-ls  /
send2nsd: NS009 - fatal configuration error: Host unknown: UNUSED
/: Host not known

Note: this host had a fresh Linux install, and does not have a host cert which will keep this from working. A host cert is on the way.

Second pass (Aug 27)

Host cert installed.

[root@tier2-04]# /etc/init.d/lfcdaemon restart
Stopping lfcdaemon: /opt/lcg/bin/lfc-shutdown: error while loading shared libraries: libglobus_gssapi_gsi_gcc32dbgpthr.so.0: cannot open shared object file: No such file or directory
/etc/init.d/lfcdaemon: line 162: kill: (-13051) - No such process
/etc/init.d/lfcdaemon: line 162: kill: (13163) - No such process
                                                           [FAILED]
Starting lfcdaemon: /opt/lcg/bin/lfcdaemon: error while loading shared libraries: libglobus_gssapi_gsi_gcc32dbgpthr.so.0: cannot open shared object file: No such file or directory
lfcdaemon not started:                                     [FAILED]

This is trivial (did not source setup.sh)

[root@tier2-04]# . /opt/setup.sh 
[root@tier2-04]# /etc/init.d/lfcdaemon restart
Stopping lfcdaemon: send2nsd: NS000 - name server not available on tier2-04.uchicago.edu
nsshutdown: Name server not active
/etc/init.d/lfcdaemon: line 162: kill: (-13051) - No such process
/etc/init.d/lfcdaemon: line 162: kill: (13163) - No such process
                                                           [FAILED]

Starting lfcdaemon:                                        [  OK  ]
[root@tier2-04]# lfc-ls  /
send2nsd: NS009 - fatal configuration error: Host unknown: UNUSED
/: Host not known

Third pass (Sept 1)

The file /opt/lcg/etc/NSCONFIG was not correct: instead of

lfcmgr/lfc_password@tier2-04.uchicago.edu/cns_db

it should read
lfc/lfc_password@tier2-04.uchicago.edu
(The lfc database user is not the same as the lfcmgr system account)

Also, in order to run the clients, the environment variable LFC_HOST (and possibly also LFC_PORT) must also be set.

With this in place, the
lfc-ls /
command now hangs for while, then prints
send2nsd: NS002 - send error : client_establish_context: The server had a problem while authenticating our connection
/: Could not secure the connection
instead of failing immediately. The lfc log file (/var/log/lfc/log) contains:
09/01 23:30:23 13342,0 Cns_serv: [128.135.158.197] (tier2-04.uchicago.edu): Could not establish an authenticated connection: server_establish_context_ext: Could not acquire credentials for the server; Csec_acquire_creds_GSI: Could not find any security certificate or proxy !

Fourth pass SUCCESS (Sep 2)

Reviewed https://twiki.cern.ch/twiki/bin/view/LCG/LfcAdminGuide#Host_certificate_key and saw that I had not installed the host cert/key correctly. In addition to /etc/grid-security, the host cert/key must be copied (not linked) to the /etc/grid-security/lfcmgr directory, with the following names and permissions.

cgw@tier2-04~$ ls -l /etc/grid-security/lfcmgr 
-rw-r--r-- 2 lfcmgr lfcmgr 1311 Aug 27 05:44 lfccert.pem
-r-------- 2 lfcmgr lfcmgr  887 Aug 27 05:44 lfckey.pem

Now, after restarting the server, queries no longer hang, but I get the following reply.

cgw@tier2-04~$ export LFC_HOST=tier2-04.uchicago.edu
cgw@tier2-04~$ lfc-ls /
Could not get virtual id for /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209: Can't open configuration file !
/: No user mapping

According to the LfcAdminGuide, if things are working correctly this command should produce no output.

Consulting the LFC logfile /var/log/lfc/log, I see

09/02 21:23:24 23553,0 Cns_vo_from_dn: NS023 - /opt/lcg/etc/lcgdm-mapfile is not accessible
09/02 21:23:24 23553,0 sendrep: Could not get virtual id for /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209: Can't open configuration file !

I grabbed /etc/grid-security/gridmapfile from one of our other servers, and copied this to tier2-04 as /opt/lcg/etc/lcgdm-mapfile. Now, I get the expected "no reply" from the lfc-ls query, and see this message in the server log:

09/02 21:23:55 23553,0 getidmap: Creating a new Virtual gid for usatlas3
09/02 21:23:55 23553,0 Cns_srv_lstat: NS092 - lstat request by /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209 (101,101) from tier2-04.uchicago.edu
09/02 21:23:55 23553,0 Cns_srv_lstat: NS098 - lstat 0 /
09/02 21:23:55 23553,0 Cns_srv_lstat: returns 0
09/02 21:23:56 23553,0 Cns_srv_opendir: NS092 - opendir request by /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209 (101,101) from tier2-04.uchicago.edu
09/02 21:23:56 23553,0 Cns_srv_opendir: NS098 - opendir / 
09/02 21:23:56 23553,0 Cns_srv_opendir: returns 0
09/02 21:23:56 23553,0 Cns_srv_readdir: NS092 - readdir request by /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209 (101,101) from tier2-04.uchicago.edu
09/02 21:23:56 23553,0 Cns_srv_readdir: returns 0
09/02 21:23:56 23553,0 Cns_srv_readdir: NS092 - closedir request by /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209 (101,101) from tier2-04.uchicago.edu
09/02 21:23:56 23553,0 Cns_srv_readdir: returns 0
This is clearly running in a highly verbose mode - in production conditions I don't think we'll want every access generating this much logging activity - in addition to generating more log files than we will know what to do with, it also must slow the server down. But for right now it's nice, because it's showing us that the LFC daemon is working.

Post-installation testing (Sep 2)

The next step according to the LfcAdminGuide is:
As root, create the /grid directory: lfc-mkdir /grid

If I try to do this as myself, I get "Permission denied":
cgw@tier2-04$ . /opt/setup.sh 
cgw@tier2-04$ grid-proxy-init
Your identity: /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209
Enter GRID pass phrase for this identity:
Creating proxy ............................................................. Done
Your proxy is valid until: Wed Sep  3 11:18:37 2008
cgw@tier2-04$ export LFC_HOST=tier2-04.uchicago.edu
cgw@tier2-04$ lfc-ls /
cgw@tier2-04$ lfc-mkdir /grid
cannot create /grid: Permission denied
and the server log shows:
09/02 23:20:45 23553,0 Cns_srv_mkdir: NS092 - mkdir request by /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209 (101,101) from tier2-04.uchicago.edu
09/02 23:20:45 23553,0 Cns_srv_mkdir: NS098 - mkdir /grid  777 22
09/02 23:20:45 23553,0 Cns_srv_mkdir: returns 13

Simple approach of just doing su does not work:
cgw@tier2-04$ su
root@tier2-04# lfc-mkdir /grid
lfc-mkdir: error while loading shared libraries: libglobus_gssapi_gsi_gcc32dbgpthr.so.0: cannot open shared object file: No such file or directory

The environment variable LD_LIBRARY_PATH is reset in the subshell where "su" is running. Setting LD_LIBRARY_PATH clears the library error but now we get "Bad credentials":
root@tier2-04# export LD_LIBRARY_PATH=/opt/glite/lib64:/opt/glite/lib:/opt/lcg/lib64:/opt/lcg/lib:/opt/mysql5/lib/mysql:/opt/globus/lib:/opt/berkeley-db/lib:/opt/expat/lib:
root@tier2-04# lfc-mkdir /grid
send2nsd: NS002 - send error : client_establish_context: Could not find or use a credential
cannot create /grid: Bad credentials

Now the server log message looks like this:
09/02 23:19:30 23553,0 Cns_serv: [128.135.158.197] (tier2-04.uchicago.edu): Could not establish an authenticated connection: server_establish_context_ext: The client claims it was not able to get a suitable credential to use to authenticate with us !

Maybe setting LD_LIBRARY_PATH was not enough, do the full setup.sh as root:
root@tier2-04# . /opt/setup.sh
root@tier2-04# lfc-mkdir /grid
send2nsd: NS002 - send error : client_establish_context: Could not find or use a credential
cannot create /grid: Bad credentials

Try grid-proxy-init as root:
root@tier2-04# grid-proxy-init
ERROR: Couldn't find valid credentials to generate a proxy.
Use -debug for further information.

Try copying over my user key and cert:
root@tier2-04# cp -a ~cgw/.globus /root
root@tier2-04# grid-proxy-init
Your identity: /DC=org/DC=doegrids/OU=People/CN=Charles G Waldman 131209
Enter GRID pass phrase for this identity:
Creating proxy ................................ Done
Your proxy is valid until: Wed Sep  3 11:19:36 2008

Now, we're right back where we started:
root@tier2-04# lfc-mkdir /grid
cannot create /grid: Permission denied

If I can create one directory and change its permissions, I can do everything else as a non-root user. But what combination of uid and grid credentials do I need to create and chmod that initial directory?

Next attempt to create /grid directory (Sep 3)

John Hover suggests:

-- /etc/ld.so.conf.d/lfc.conf contains 
/opt/globus/lib 
/opt/lcg/lib 
/opt/glite/lib 

When I create this file and run ldconfig, I get this warning: /sbin/ldconfig: /opt/glite/lib/libvomsc_gcc32dbgpthr.so.0 is not a symbolic link /sbin/ldconfig: /opt/glite/lib/libvomsapi_gcc32dbgpthr.so.1 is not a symbolic link

root@tier2-04# cd /opt/glite/lib/
root@tier2-04# ls -l libvomsc_gcc32dbgpthr.so*
-rwxr-xr-x 1 root root 1847603 Aug 26 19:17 libvomsc_gcc32dbgpthr.so
-rwxr-xr-x 1 root root 1847603 Aug 26 19:17 libvomsc_gcc32dbgpthr.so.0
-rwxr-xr-x 1 root root 1847603 Aug 26 19:17 libvomsc_gcc32dbgpthr.so.0.0.0

These should be as follows:

lrwxrwxrwx 1 root root      30 Sep  4 14:18 libvomsc_gcc32dbgpthr.so -> libvomsc_gcc32dbgpthr.so.0
lrwxrwxrwx 1 root root      30 Sep  4 14:18 libvomsc_gcc32dbgpthr.so.0 -> libvomsc_gcc32dbgpthr.so.0.0.0
-rwxr-xr-x 1 root root 1847603 Aug 26 19:17 libvomsc_gcc32dbgpthr.so.0.0.0

There is a similar problem with libvomsapi_gcc32dbgpthr.so.1.0.8 After fixing these links, ldconfig no longer reports any errors.

However, the underlying problem remains.

John Hover also recommended running client and server after setting
 export GLOBUS_GSI_CERT_UTILS_DEBUG_LEVEL=5 
 export GLOBUS_GSSAPI_DEBUG_LEVEL=5 
 export CSEC_TRACE=1 
which turned out to be a valuable suggestion - there was an error with the file ownership on
/etc/grid-security/host{key,cert}.pem, which were owned by lfcmgr instead of root.

Fixed this, now root can do the needed lfc-mkdir /grid command.

Testing with remote users (Sep 5)

First attempt by Hironori Ito (BNL) to access the LFC failed, but this was for trivial reasons - the LFC port (5010) was being blocked by a firewall. Once this was resolved, the next error reported was:
[acas0004] /usatlas/u/hiroito/Programming/python > voms-proxy-info -all
subject   : /DC=org/DC=doegrids/OU=People/CN=Hironori Ito 564424/CN=proxy 
issuer    : /DC=org/DC=doegrids/OU=People/CN=Hironori Ito 564424 
identity  : /DC=org/DC=doegrids/OU=People/CN=Hironori Ito 564424 
type      : proxy 
strength  : 512 bits 
path      : /tmp/x509up_u9290 
timeleft  : 42:03:20 
=== VO atlas extension information === 
VO        : atlas 
subject   : /DC=org/DC=doegrids/OU=People/CN=Hironori Ito 564424 
issuer    : /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch 
attribute : /atlas/Role=production/Capability=NULL 
attribute : /atlas/lcg1/Role=NULL/Capability=NULL 
attribute : /atlas/Role=NULL/Capability=NULL 
attribute : /atlas/usatlas/Role=NULL/Capability=NULL 
attribute : /atlas/soft-valid/Role=NULL/Capability=NULL
[acas0004] /usatlas/u/hiroito/Programming/python > env |grep -i lfc 
LFC_HOST=tier2-04.uchicago.edu 
LFC_HOME=tier2-04.uchicago.edu
[acas0004] /usatlas/u/hiroito/Programming/python > lfc-ls /grid/atlas 
/grid/atlas: Could not secure the connection 

The server log has:
 
 9/04 19:22:35  5552,0 Cns_serv: [130.199.48.54] (unknown): Could not establish an authenticated connection: server_establish_context_ext: Error while getting voms credentials for /DC=org/DC=doegrids/OU=People/CN=Hironori Ito 564424; _Csec_get_voms_creds: Cannot find certificate of AC issuer for vo atlas; Csec_server_set_service_name: Could not set service name; Csec_get_peer_service_name: Could not Cgetnetaddress: Host not known !

I did not complete all the steps on the LfcAdminGuide - in particular the "Create one Directory per VO" section:
lfc-mkdir /grid/atlas
lfc-entergrpmap --group atlas
lfc-chown root:atlas /grid/atlas
lfc-chmod 775 /grid/atlas
lfc-setacl -m d:u::7,d:g::7,d:o:5 /grid/atlas

While carrying out these steps:
[root@tier2-04 lfc]# lfc-mkdir /grid/atlas
cannot create /grid/atlas: Internal error
[root@tier2-04 lfc]# ls /grid/atlas
ls: /grid/atlas: No such file or directory
[root@tier2-04 lfc]# lfc-ls /grid
[root@tier2-04 lfc]# lfc-ls /
grid
[root@tier2-04 lfc]# lfc-ls /grid
[root@tier2-04 lfc]# lfc-mkdir /grid/atlas
[root@tier2-04 lfc]# lfc-ls /grid
atlas
[root@tier2-04 lfc]# lfc-ls /grid/atlas
[root@tier2-04 lfc]# 
The "Internal error" is a bit worrisome but on a subsequent attempt it did not recur.

Hiro is now seeing a different problem, where lfc-ls works with a plain grid proxy, but not with voms-proxy-init. Referring to the troubleshooting guide I found this in the FAQ section:

https://twiki.cern.ch/twiki/bin/view/LCG/VomsSetup

On LFC & UI, /etc/grid-security/vomsdir contains VO VOMS server

$ ls -ld /etc/grid-security/vomsdir/
drwxr-xr-x    2 root  root  4096 Jun  8 15:07 /etc/grid-security/vomsdir/
$ ls /etc/grid-security/vomsdir
cclcgvomsli01.in2p3.fr.43
lcg-voms.cern.ch.1265
...

In this installation, there is no /etc/grid-security/vomsdir at all, I am trying to find out where these files are supposed to come from. For a quick fix I copied them over from another host, and now Hiro reports success. But this needs to be fixed in the VDT installation.

-- CharlesWaldman - 1 Sep 2008
Topic revision: r12 - 24 Jul 2009, CharlesWaldman
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback