Linux Vada not running on RH6/7

darrenj · March 21, 2017, 8:00pm

I’ve posted in https://community.squaredup.com/answers/question/permissions-needed-for-vada-in-linux/?show_answer=3795#answer_3795#comment-525

but then thought it’s better to start a new thread.

When I run VADA on Linux it just sits there and does nothing. This seems to be on RH6 and RH7 boxes. RH5 is fine.

run the Get Netstat CSV in SCOM I get a lot of info back on processes. I’ve pasted last few lines (and sanitised IP addresses/servernames)

server1,16389,cfagent,”/var/cfengine/bin/cfagent -Dfrom_cfexecd:scheduled_run”,TCP,000.000.000.000,54620,000.000.000.000,5308,ESTABLISHED,000.000.000.000

server1,18515,sshd,”sshd: ausername [priv]”,TCP,000.000.000.181,22,000.000.000.000,56947,ESTABLISHED,000.000.000.000

StdErr ERROR: Process ID list syntax error. ********* simple selection ********* ********* selection by list ********* -A all processes -C by command name -N negate selection -G by real group ID (supports names) -a all w/ tty except session leaders -U by real user ID (supports names)

We have a privileged run-as account configured in SCOM. I can ssh into the box and execute Netstat and PS with no issues.

If I copy/paste the script from

https://github.com/squaredup/Community.DataOnDemand.MP/blob/master/ManagementPacks/Community.DataOnDemand.Unix/Scripts/GetNetstatCSV.sh

It returns

unix 3 [ ] STREAM CONNECTED 14153 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 18033 Unknown format type [scomaccount@server1 ~]$

darrenj · March 22, 2017, 9:13am

netstat -tpn returns a - (dash) in the final column.

darrenj · March 22, 2017, 9:21am

Thanks for the help. For your two comments:

I'm lost on the & and &. I read that as replacing one character with an identical one? (ie & and & ). So I'm confused :)

2. I ran sudo netstat -tpn

output sample is (IP address sanitized)

tcp6 0 0 1.1.1.1:80 2.2.2.2:47785 TIME_WAIT -
tcp6 0 0 1.1.1.1:55362 2.2.2.2:12201 ESTABLISHED 1468/java
tcp6 0 0 1.1.1.1:80 2.2.2.2:14464 TIME_WAIT -
tcp6 0 0 1.1.1.1:80 2.2.2.2:28805 TIME_WAIT -
tcp6 0 0 1.1.1.1:80 2.2.2.2:53221 TIME_WAIT -
tcp6 0 0 1.1.1.1:80 2.2.2.2:31029 TIME_WAIT -

viper · March 22, 2017, 9:33am

No worries, I meant replace:

&amp;

with & - didn’t see that it was of course displaying as & in my reply!

That output looks fine as the script then sends it to grep and filters out non-established connections (you can run netstat -tpn | grep ESTABLISHED if you wanted output as per the script).

You might have to execute a couple of times, to get the offending connections to show up - do you know if these servers have any traffic that’s going to a kernal owned port (such as NFS) rather than a process owned one? Might explain the problem (and would mean we’d need to raise an issue on github, as the script as is doesn’t support that).

darrenj · March 22, 2017, 9:39am

grep ESTABLISHED showed better results, however two entries had a dash.

not sure about the traffic question – this is getting past my linux knowledge. But I’m happy to ask our linux engineers – will take a day to get a reply (it’s 10.30pm at moment). So if you have some specific questions you want me to find out as well, let me know.

darrenj · March 22, 2017, 8:30pm

OK, answer I got was that those servers do mount an NFS share. They are not NFS servers, just clients.

can’t run lsof as it reports ‘command not found’. ss didn’t shed any light on the owning process either.

I found a little used RH7 server and ran VADA on it. Worked perfectly.

Tried another busier server and got the same issues as originally. So it’s something in the output that’s causing it to fall over I think.

bacus · August 23, 2017, 3:52pm

So the solution is to modify the script?

I have issue with VADA since my upgrade to 2016, working with support too. Worked perfectly before this,

ERROR:

EXCEPTION:
System.ArgumentNullException: Value cannot be null.
Parameter name: s
   at System.IO.StringReader..ctor(String s)
   at SquaredUp.Connector.ScomTask.ScomTaskController.GetOutputAsTable(String output, String format)
   at SquaredUp.Connector.ScomTask.ScomTaskController.<>c__DisplayClass7_0.<Execute>b__0()

viper · March 22, 2017, 8:56am

The two sample lines look fine - can you confirm that all of the result rows have their PID and process information present? Also be aware when using the script outside SCOM you’ll need to decode the & references into & (this is required for nesting the script inside a management pack).

viper · March 22, 2017, 9:08am

Just saw your previous comment on the prior question - the script takes the PID returned from each Netstat result and calls the following commands with the pid concatenated on the end. So from your sample:
ps -o args= --pid 16389
ps -o comm= --pid 16389

So if that is valid syntax on your RH6/7 systems, it suggests that the concatenation didn’t work - which either means column 7 didn’t contain any data for a particular connection, or that it wasn’t in the form PID/Program Name.

Can you run netstat -tpn on those systems and verify what’s in the final column?

darrenj · March 22, 2017, 9:08am

all the rows have their PID and process, except two rows which have - for the PID, but the process is listed. There is over 60 rows returned.
Not sure what you meant by decoding the & references - I’m not a linux guy

viper · March 22, 2017, 9:15am

Was that running as root (via sudo if that’s your setup) as - isn’t a valid PID and likely means you don’t have privileges to extract process information for that connection…

viper · March 22, 2017, 9:17am

Just means replace all instances of & with & - it’s an XML thing relevant to the MPs, and SCOM automatically replaces them prior to running the script on your system

viper · March 22, 2017, 10:01am

Probably best to ask them just that - you can also try running lsof -i TCP:portnumber (where portnumber is the local port reported in the netstat output) to see if that can resolve the process owning the connection. failing that, try sudo ss -tapn | grep :portnumber which may or may not shed some light on the owning process as well (if any).

viper · April 3, 2017, 8:56am

lsof is only available to root and may not be installed on your system (you can use rpm to check). Sounds like it may be a kernal owned process that’s the issue - i’ll raise an issue on github for someone to look at it. Likely the task can just be modified to report the connection as unknown (since we know the source/destination, just not the process).

srobyeah · May 28, 2020, 2:10pm

Did you have a solution for this I’m having the exact same issue

bacus · May 28, 2020, 2:20pm

My solution was to recreate a new SCOM environment in parallel. Since I upgraded to SCOm 2019, I have this situation again

Not sure if it is related to Linux MP (Universal now for 2019, old RH are not supported anymore). I had to reimport old MPs to support RH 6-7, broken since then