One of those things I never spend much time on was reading about NFS. There is no need for it, it just kinda works out the box. Most of the time that is. The headache starts when it stops working, or pushes your cluster to a grind. One of the first resources you will find is to check /proc/net/rpc/nfsd. But that does not help you much further, so I recently started monitoring the content. Let’s see what is what.
Getting the information is easy :
cat /proc/net/rpc/nfsd rc 0 225891120 1142621245 fh 94 0 0 0 0 io 2419795101 674116438 th 16 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ra 32 599168798 0 0 0 0 0 0 0 0 0 7053691 net 1369894951 0 1369046785 21585 rpc 1369784542 0 0 0 0 proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 proc3 22 84 253414826 11612033 88162237 147421518 109847 606883959 183127783 11620524 290559 110168 0 11837554 213037 2453880 4605041 36226 12203037 1305364 112 56 20349016 proc4 2 1 548713 proc4ops 59 0 0 0 104291 0 0 0 0 0 218574 3 0 0 0 0 3 0 0 0 0 0 0 548712 0 1 0 330138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Its a bit much, so let’s go line by line :
RC : Reply Cache
The reply cache is a cache that will temporary store idempotent operations, those are operation that can be repeated multiple times without failing. Think : reading a block over NFS. So when a reply gets lost due to bad connection (or other reasons) NFS does not have to do the operation again (performance gain), and the client does not see an error. While this seems a simple enough cache, the reported data is however a bit complex :
<hits> <misses> <nocache> 0 225891120 1142621245
- hits : client did not receive a reply and decided to re-transmit its request and the reply was cached. (bad)
- misses : an operation that requires caching (idempotent)
- nocache : non-idempotent operation (rename/delete/…)
This is how reply cache looks like for a busy server : (over the weekend)
On most servers I monitor I see almost no misses and only nocache, I have not seen any hits so far. Interesting article about the size/use of the reply cache (pdf).
FH : File Handles
File handles are small pieces of memory that keep track of what file is opened; However on all the machines I monitor these are all zero except for the first value stale file handles. Stale file handles happen when a file handle references a location that has been recycled. This also occurs when the server loses connection and applications are still using files that are no longer accessible. All other values are not used.
<stale> <total_lookups> <anonlookups> <dirnocache> <nodirnocache> 94 0 0 0 0
I believe a large amount of stale files are a bad sign, so far I haven’t seen this happen.
IO : Input Output
This is simple, it is the total amount of bytes read from disk (raid) and total amount of bytes written to disk since the last restart.
<read> <write> 2419795101 674116438
Example workload :
TH : Threads
This shows the information on the threads in the “NFS process”; a large part of the values however have been kicked out of the kernel upstream of 2.6.29.
<threads> <fullcnt> <deprecated_histogram> 16 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
- threads : amount of threads used by the NFS daemon. (this can be increased in /etc/sysconfig/nfs, option RPCNFSDCOUNT)
- fullcnt : amount of times all the NFS threads where used. (most likely deprecated)
For tuning how many threads are needed you could look at : /proc/fs/nfsd/pool_stats, explained here
RA : Read Ahead Cache
Read ahead cache, is a cache that is being used when a sequential blocks are expected to be requested. However I found no explanation saying so, so perhaps I’m wrong. When a certain block is requested as read, the next few blocks have a large chance to be requested as well, so the NFS thread already caches these.
<cachesize> <10%> <..> <100%> <notfound> 32 599168798 0 0 0 0 0 0 0 0 0 7053691
- cache size : always double of threads (no idea why)
- 10%-100% : histogram of depth the block was found.
- not found : not in read-ahead cache
In real live example : I only see the 10% and the not found increase. With the most being in 10%.
NET : Network
Statistics on network use.
<netcount> <UDPcount> <TCPcount> <TCPconnect> 1369894951 0 1369046785 21585
- netcount : count every packet
- UDPcount : count every UDP packet
- TCPcount : count every TCP packet
- TCPconnect : count every TCP connection
These values depend on how you use NFS (UDP or TCP) and how active the NFS is. The following graph is an example of plotted UDP/TCP over time. Clearly I prefer TCP over UDP connections. (TCP connections should following the line of TCP) Seeing allot of UDP traffic is suspicious, normally NFSv3 will default to TCP, NFSv2 did use UDP.
RPC : remote procedure call
NFS under Linux relies on RPC calls to route requests between client and server.
<count> <badcnt> <badfmt> <badauth> <badcInt> 1369784542 0 0 0 0
- rpccount : all RPC operations
- badcnt : combined value of next 3, or all bad calls
- badfmt : most bad calls
- badauth : bad authentication
- badcInt : unused
So far I haven’t seen much bad calls.
Proc 2 are the stats generated for NFS clients using v2 protocol. This is pretty old (described in 1989) and I could not find a good reason to use these over the v3 or v4 protocols. Except for some tests, I don’t think any current distribution still uses this. In fact in Centos 7 the proc2 line has been dropped totally. (and I assume support also)
The operations however haven’t change that much in v3 protocol, so its still worth looking what they mean.
- The first value indicated how many values are to follow. (18)
- null : do nothing (debug)
- getattr : get file attributes
- setattr : set file attributes
- root : get filesystem root (obsolete)
- lookup : look up file name
- readlink : read from symbolic link
- read : read from file
- wrcache : write to cache (not used)
- write : write to file
- create : create file
- remove : remove file
- rename : rename file
- link : create link to file
- symlink : create symbolic link
- mkdir : create directory
- rmdir : remove directory
- readdir : read from directory
- fsstat : get filesystem attributes
Proc 3 are subsequent the stats of v3 protocol stats. This was described in 1995 so its the most commonly used protocol. Its most likely to find decent statistics here. I only describe values that are different from v2 protocol.
- values to follow (22)
- null / getattr / setattr / lookup
- access : check access permission
- readlink / read / write / create / mkdir / symlink
- mknod : create a special device
- remove / rmdir / rename / link / readdir
- readdirplus : extended read from directory
- fsstat : get dynamic file system information
- fsinfo : get static file system information
- pathconf : retrieve POSIX information
- commit : commit cached data to stable storage (server side)
Working NFS server :
As you see by far most are read/writes. Remarkable allot of commit’s those are called after async writes … (this graph is no database work)
Proc 4 is the newest protocol (2000) and one major change is the use of compound. Instead of sending 3 operations to open a file (lookup, open, read), in v4 this can be done in one compound. This means that the line can be shortened to the two types of compounds : null (do nothing) and compound (do something). Since null is only used as debug, a proc4ops are generated.
These are the operation in v4, once the compound is ‘unpacked’. This is where thinks become really complex, beside the v4 there are also v4.1 and v4.2 extensions to the protocol. You will see between (current) distributions the value count differ pretty big. I found 40, 59 and 71 values in my environment. For my librenms script I used the 59 count. The 40 value’s are v4, the 59 are with v4.1 and the 71 are the v4.2 however you will need to dig into the kernel code to find out to see if the value order hasn’t changed. I assume it hasn’t.
- op0-unused op1-unused op2-future : unused values
- access / close / commit / create
- delegpurge : purge delegations awaiting
- delegreturn : return delegations
- getfh : get current file handle
- lock : create lock
- lockt : test for lock
- locku : unlock
- lookup : lookup filename
- lookupp : lookup parent directory
- nverify : verify difference in attributes
- openattr : open named attribute directory
- open_confirm : confirn open
- open_dgrd : reduce open file access
- putfh : set current filehandle
- putpubfh : set public filehandle
- putrootfh : set root filehandle
- read / readdir / readlink / remove / rename
- renew : renew a lease
- restorefh : restore saved filehandle
- savefh : save current filehandle
- secinfo : obtain available security
- setcltid : negotiate client id
- setcltidconf : confirm client id
- verify : verify same attributes (see nverify)
- rellockowner : release lock owner (obsolete in v4.1)
And that’s it… not that I’m an expert now, but at least I know what is happening, based on the charts generated. A while a go we noticed tons of getattr/readdir(plus) calls, and this way we knew some code was looking in a huge (500m+) directory’s.