Archive

Archive for the ‘Openvz and Virtuozzo’ Category

One-liners for troubleshooting Virtuozzo load issues

June 9, 2013 1 comment

 

I wish to introduce the various one-liners that can be used to troubleshoot, load or performance issues, within a Virtuozzo node.

To begin with, we will discuss the various situations that can degrade performance of a Virtuozzo node. As is obvious, heavy usage of either/all of CPU, Memory, Disk or Network will degrade the performance of node. The same case applies with Virtuozzo nodes too, but with an additional complexity; it could also happen as a result of the processes running inside a container. In this case we need to identify the problem container and deal with the processes inside that container.

I have found these one-liners quite useful for finding out the problem container while troubleshooting Virtuozzo alerts.

 

1) Troubleshooting load issues caused by high CPU activity

 

=> Display the list of containers sorted based on cpu usage,

/usr/sbin/vzlist -o ctid,laverage

OR

/usr/sbin/vzstat -t -s cpu|awk ‘NF==10{print $0}’

=> Sometimes the above one-liners will not show the actual cpu usage inside the containers(possibly due to some delay in updating the stats), but still, the load inside node will be high. In this situation, running the command pasted below will help find out the cpu intensive containers.

for i in `/usr/sbin/vzlist -H -o ctid`; do echo "CTID: ${i} `/usr/sbin/vzctl exec ${i} cat /proc/loadavg`"; done

=> List out all containers for which the status is not in “OK” status. This is quite helpful while troubleshooting load issues when the load average in the node is super-high(above 1000)

/usr/sbin/vzstat -t|awk ‘{if(NF==10 && $2!=”OK” && $1!=”CTID”)print $0}’

=> Lists the top 10 containers based on number of processes running inside the container.

/usr/sbin/vzlist -H -o ctid,numproc|sort -r -n -k2|head

 

2) Troubleshooting load issues caused by n/w activity

 

=> Sorts containers based on socket usage

/usr/sbin/vzstat -t -s sock|awk 'NF==10{print $0}'

=> Sorts containers based on TCP sender buffer usage,

/usr/sbin/vzlist -H -o ctid,tcpsndbuf |sort -r -n -k2

=> Sorts containers based on TCP receive buffer usage,

/usr/sbin/vzlist -H -o ctid,tcprcvbuf |sort -r -n -k2

=> Sorts containers based on the highest inbound traffic(quite useful while troubleshooting n/w related attacks),

/usr/sbin/vznetstat -r |awk '$3 ~ /G/ {print $0}'|sort -r -nk3

=> Sorts containers based on the highest oubound traffic(quite useful while troubleshooting n/w related attacks) ,

/usr/sbin/vznetstat -r |awk '$5 ~ /G/ {print $0}'|sort -r -nk5

 

3) Troubleshooting performance issues caused by memory utilization

 

The ‘dmesg‘ command displays containers which has resource shortages. There is possibility that it could be the abusive one. So there is a need to check the process inside that container.

[root@virtuozzo ~]# dmesg|egrep -v '(SMTP-LOG|INPUT-DROP|LIMIT-PPS-DROP|FORWARD-DROP)'
TTL=64 ID=0 PROTO=UDP SPT=68 DPT=67 LEN=556
[1101732.300833] __ratelimit: 44 callbacks suppressed
[1101732.310531] Fatal resource shortage: kmemsize, UB 12215.
[1101742.294179] Fatal resource shortage: kmemsize, UB 12215.
[1101752.277368] Fatal resource shortage: kmemsize, UB 12215.
[1101752.393226] Fatal resource shortage: kmemsize, UB 12215.
[1105092.458621] __ratelimit: 101 callbacks suppressed
[1105092.468411] Fatal resource shortage: kmemsize, UB 12215.
[root@virtuozzo ~]#

 

4) Troubleshooting load issues caused by high disk I/O activity.

 

You can install the ‘atop’ command and spot problem processes at the top of the list when sorting by disk usage (‘D’). To get more information on using ‘atop’ refer url.

The above one-liners will help you identify the problem CTID or the process(PID) responsible for performance issue. In the second case after finding the process-id, you can use ‘vzpid’ command to spot the container inside which the process is running and either renice or stop that process. And in the first case, you can view the processes running inside the container using either ‘vzps’ or ‘vztop’ command. The usage of which is given below,

vztop -b -c -n 1 -E 

OR

vzps auxfww -E 

So, that is it guys. I sincerely hope you get to take away something helpful from all this.

Happy Hunting 😀

Comprehensive Analysis of /proc/user_beancounters : Virtuozzo

June 8, 2013 Leave a comment

While troubleshooting issues related to a Virtuozzo VPS, we usually come across the ‘user_beancounters’ file in the “/proc directory”. This file is of importance only if we use UBC or Combined SML+UBC memory mode for our Virtuozzo VPS. The resource control information about running virtual environments is contained in this file. So basically, ‘/proc/user_beancounters’ represents your VPS’s allocation of various system resources (memory). It thus is a main indicator of how well our VPS works, how stable it is, or whether there is a resource shortage. So, if you face any trouble while running or installing applications on your VPS, one good way to find the source of the problem is to take a look at this file.

Let’s dig deeper into the details of this file.

In Parallels Virtuozzo containers, virtualization technology resource control settings for each virtual machine are stored in the configuration file “/etc/vz/conf/XXX.conf” (where XXX is the ID of the given CT). These settings are loaded and applied to the containers during the VPS’s startup or on some events such as execution of “vzctl set CTID”. For running containers the resource control settings are exposed via “/proc/user_beancounters”. One such file “/proc/user_beancounters” exists in the node and one inside the VPS too. The file in the hardware node contains the resource control settings of all running VPSs. A pictorial representation of the file “/proc/user_beancounters” inside a VPS is shown below:

image1

A brief description of the various columns are given below,

UID: Indicates the ID of the container. In Virtuozzo each container is given a unique ID for the ease of management.

RESOURCE: This field indicates the primary,secondary and auxiliary parameters in Virtuozzo. In order to get more details of these resources refer url

HELD: Indicates the current usage of the various resources.

MAXHELD: Indicates the maximum usage of the resource since VPS startup.

BARRIER & LIMIT: Gives us the values of the softlimit and hardlimit of the virtozzo resource controls. Resource requests above that particular limit gets denied.

FAILCNT: It shows the number of refused or rather denied resource allocations of VPS right from the start up stage of the VPS. A non-zero value in this column indicates resource shortage and we need to either increase that particular resource or find the process responsible for it and optimize it. Otherwise it can cause weird issues with services running inside the container. Eg: unexpected service down, intermittent website issues,etc.

The following awk script can be used to list out all the containers with a non-zero values for the column “failcnt”. This script will print out all the containers with a non-zero failcnt value, along with their resource name and the corresponding failcnt value. Save the script as “/root/failcnt.awk” or any name that you like.

/root/failcnt.awk

BEGIN{OFS="";i=0;j=1;failcntflag=0;}
{
if(NF==7 && index($1,":") >0 ){
if(failcntflg==1){
printf "\nCTID=%s",arr1[1];
for(j=1;j<=i;j++){ printf " %s ",vector[j];delete vector[j];}
failcntflg=0;i=0;
}
split($1,arr1,":");

if($NF!=0) {
i = i+1;
failcntflg=1;
vector[i] = $2" "$NF;
}

}

 

if (NF==6 && $NF!=0){
i = i+1;
failcntflg=1;
vector[i] = $1" "$NF;
}
}
END{ printf "\n" }

Now run the script from node as follows,

[root@adminahead ~]# awk -f /root/failcnt.awk /proc/user_beancounters

CTID=10592 lockedpages 13
CTID=13917 kmemsize 357 shmpages 4 physpages 5 oomguarpages 1 tcprcvbuf 755
CTID=13904 kmemsize 528 numothersock 1
CTID=13905 kmemsize 73 numothersock 1
CTID=13897 kmemsize 1 shmpages 4 tcprcvbuf 4751
CTID=10000000 numothersock 1986
CTID=10594 kmemsize 27 physpages 7 oomguarpages 1 tcpsndbuf 295136
CTID=12435 shmpages 4
CTID=12437 kmemsize 2 shmpages 2 tcprcvbuf 690
CTID=12441 shmpages 3
CTID=12438 shmpages 1 physpages 712 oomguarpages 73 tcpsndbuf 63
CTID=10651 physpages 15 oomguarpages 8
CTID=10611 physpages 24 oomguarpages 11
CTID=10623 numothersock 14
CTID=10570 physpages 6 oomguarpages 3
CTID=10578 physpages 517 oomguarpages 33
CTID=10603 physpages 49 oomguarpages 40
CTID=10633 physpages 87 oomguarpages 24
CTID=10610 numproc 71 physpages 2250 oomguarpages 472
[root@ adminahead ~]#

As you can see from the above output, container “13917” shows the highest number of ‘failcnt’ for resources. For this VPS, “kmemsize”,”shmpages”,”physpages”,”oomguarpages” and “tcprcvbuf” show non-zero failcnt values and among them the first four resources are related to RAM. Upgrading the RAM inside that VPS is a good suggestion, but that should be considered only after finding out the resource intensive process inside the container and optimizing it.

You can use the following commands to list out the memory intensive processes inside the container.

* Lists top 3 memory intensive processes,

ps -auxf | sort -nr -k 4 | head -3

OR

wget -O /root/ps_mem.py http://www.pixelbeat.org/scripts/ps_mem.py
python /root/ps_mem.py |tail -3

The “/proc/user_beancounters” in the node can be monitored continuously to find out the VPSs that are short of resources and the corresponding VPS owner can be contacted for resource upgrade or optimization.

Renaming Virtuozzo Container CTID

March 18, 2013 Leave a comment

 

 

SITUATION: I wan’t to rename a virtuozzo container CTID from CTID 14383000 to 14383.

 

SOLUTION: Use the ‘vzmlocal‘ command to rename the container’s CTID.

 

Eg:

 

[root@node ~]# /usr/sbin/vzlist -a |grep 192.168.17.62
14383000 68 running 192.168.17.62 testvps.com
[root@node ~]# 
[root@node ~]# /usr/sbin/vzmlocal 14383000:14383
vzctl_conf_get_param(VZ_TOOLS_BCID) return 5
vzctl_conf_get_param(VZ_TOOLS_IOLIMIT) return 1048576
Moving/copying CT#14383000 -> CT#14383, [], [] ...
Moving private area '/vz/private/14383000'->'/vz/private/14383'
done
Copying/modifying config scripts of CT#14383000 ...
OfflineManagement CT#14383000 ...
done
OfflineManagement CT#14383 ...
done
Successfully completed
[root@node ~]# /usr/sbin/vzctl start 14383
[root@node ~]# /usr/sbin/vzlist -a |grep 192.168.17.62
14383 68 running 192.168.17.62 testvps.com
[root@node ~]# 

 

 

Categories: Openvz and Virtuozzo

Script for ssh login to Virtuozzo vps using expect and bash

February 8, 2013 Leave a comment

 

 

OBJECTIVE: Automate login to Virtuozzo vps

 

INPUT :
1) Linux Virtuozzo vps IP address
2) User’s password of node.

 

ASSUMPTIONS:
1) Package for ‘expect’ application is installed in your local machine
2) The linux vps is running
3) The USERNAME variable in bash script is set as “username”. Change it to your Virtuozzo node username which is used for logging in.
4) Both scripts “smtpvpscheck.sh” and “expect.ex” are put under the same directory.

 

LOGIC:
1) The bash script accepts the linux VPS ip address as input. From vps ip it finds the node ip using ‘mtr’ command. After that it requests the users password to be entered. Here, the username is hard coded to the script “smtpvpscheck.sh”. If you want to change it to yours, then modify it directly in the script.

 

At the final section of the bash script, it calls the expect script “expect.ex” and pass the node’s ip, username(for ssh), password and the VPS ip to it.

 

 

Bash script: smtpvpscheck.sh

#!/bin/bash

USERNAME="username"

if [ $# -ne 1 ];then 
 echo "Usage: smtpvpscheck.sh "
 exit 1
fi

VPSIP=$1
echo "VPSIP" $VPSIP

NODEIP=`mtr -nr ${VPSIP}|tac|sed -n 2p|awk '{print $2}'`
echo "NODEIP" $NODEIP

read -p "Enter node password:" -s PWD

expect -f expect.ex $NODEIP $USERNAME $PWD $VPSIP

unset PWD

 

2) The expect script “expect.ex“, logs into the node using the username and password(passed on as arguments) and then switches as root user(using sudo su). After that from the VPS ip address it finds out the container ID(CTID) and then enters to it using “vzctl” command.

 

 

Expect script: expect.ex

log_user 0
set NODEIP [lrange $argv 0 0]
set USERNAME [lrange $argv 1 1] 
set PWD [lrange $argv 2 2]
set VPSIP [lrange $argv 3 3]

spawn ssh ${USERNAME}@${NODEIP}
expect "password"
send -- "${PWD}\r"
expect "${USERNAME}@"
send -- "sudo su\r"
expect "password" { send -- "${PWD}\r" }
expect "root@"
#send -- "/usr/sbin/vzctl enter `/usr/sbin/vzlist -o ctid,ip|grep ${VPSIP}|sed "s/\([0-9]\{1,9\}\).*/\1/"|sed -e "s/^\s*//"`\r"
send -- "/usr/sbin/vzctl enter `grep -il ${VPSIP} /etc/vz/conf/*.conf|cut -d\/ -f5|cut -d\. -f1`\r"
expect "CT-14748-bash"
send -- "\r\n"
interact

SAMPLE RUN

 

reynoldp@w10:~/scripts/$ ./smtpvpscheck.sh 16.24.74.67
VPSIP 16.24.74.67
NODEIP 16.24.74.8
Enter node password:

CT-15160-bash-4.1#

 

 

SITUATIONS FOR USAGE

Normally for troubleshooting SMTP spam abuses we first find the node of vps, then logs in to the node and then from there we switch to root user and then we find out the vps CTID from ip address and finally enters to the CTID using “vzctl” command. This will typically take some minutes, and using this script this time frame is reduced to a minute or two.

Virtuozzo : “Error: lock timeout exceeded”

August 4, 2012 1 comment

 

 

ISSUE: Application template installation/removal into a container was failing with error “Error: lock timeout exceeded“. The same timeout exceeded error was reported in ‘/var/log/vztt.log‘.

 

[root@virtuozzo ~]#  /usr/sbin/vzpkg install 1272 cpanel
Error: lock timeout exceeded
[root@virtuozzo ~]# /usr/sbin/vzpkg remove 1272 cpanel
Error: lock timeout exceeded
[root@virtuozzo ~]# tail /var/log/vztt.log
2012-06-12T14:16:02-0400 : Error: lock timeout exceeded
2012-06-12T14:20:45-0400 : Error: lock timeout exceeded
2012-06-12T14:26:02-0400 : Error: lock timeout exceeded
2012-06-12T14:26:02-0400 : Error: lock timeout exceeded
2012-06-12T14:31:35-0400 : Error: lock timeout exceeded
2012-06-12T14:36:02-0400 : Error: lock timeout exceeded
2012-06-12T14:36:02-0400 : Error: lock timeout exceeded
2012-06-12T14:45:08-0400 : Error: lock timeout exceeded
2012-06-12T14:46:02-0400 : Error: lock timeout exceeded
2012-06-12T14:46:02-0400 : Error: lock timeout exceeded
[root@virtuozzo ~]# 

 

 

FIX: There was a lock file(/vz/template/centos/5/x86_64/.lock ) present inside the vps os template directory which prevented the installation of application templates. You can use the strace command(strace -e trace=open /usr/sbin/vzpkg remove 1272 cpanel ) to view the files opened by the vzpkg process. In my case strace reported the existence of a lock. Removing the lock file fixed the issue for me.

 

[root@virtuozzo ~]# cat /vz/template/centos/5/x86_64/.lock 
[root@virtuozzo ~]# file /vz/template/centos/5/x86_64/.lock 
/vz/template/centos/5/x86_64/.lock: empty
[root@virtuozzo ~]# ls -l /vz/template/centos/5/x86_64/.lock
-rw------- 1 root root 0 Jun 12 15:16 /vz/template/centos/5/x86_64/.lock
[root@virtuozzo ~]# rm -f /vz/template/centos/5/x86_64/.lock
[root@virtuozzo ~]# 
Categories: Openvz and Virtuozzo

Virtuozzo: Failed to enter Container

August 4, 2012 Leave a comment

 

 

ISSUE: I am unable to enter into one of the vps inside virtuozzo node using “vzctl enter CTID” command. Getting the error “enter failed. Failed to enter container”.

root@virtuozzo# vzctl enter 1330
enter failed
Failed to enter Container 1330
root@virtuozzo# 

 

 

REASON : VZFS symlinks of the Container private area to system and application templates are somehow corrupted.

 

 

FIX: Use the vzctl recover CTID option to re-write the original symlinks to the Container private area.

root@virtuozzo# vzctl recover 1330
...
...
root@virtuozzo# 

 

 

As per the parallels documentation the recover option doesn’t touch the user data files, so there is no problem of data missing.

Categories: Openvz and Virtuozzo

Script to check failcnt in /proc/user_beancounters of all vps

August 20, 2011 Leave a comment

In openvz and virtuozzo vps nodes, sometimes resource overconsumption can result in unexpected behaviour with the vps node. The resource which is over-utilised will be mentioned in “/proc/user_beancounters”. The following script will help the one who manages the main hardware node to find out all the vps nodes for which resource failures are reported, and can proactive actions based on the output.

This script needs to be executed from the main hardware node.

#!/bin/bash
# Create AWK rules and save it in /root/checkresource.awk
# Filename: /root/vpsresourcecheck.sh


########################
### Func: checkall
### No of Arguments = 0
########################
function checkall {

   flag=0

   #Create a file containing awk rules
   echo -e "{\nif(NF==7 && match(\$1,\"[0-9]+:\") && \$NF!=0 )\nprint \$2;\nelse if(NF==6 && \$NF!=0)\nprint \$1;\n}" > /root/checkresource.awk

   # Check "/proc/user_beancounters" of all vps nodes and print the CTID of ones with failcnt
   #
   for i in $(vzlist -o ctid|grep -v CTID|awk '{print $1}')
   do
     vzctl exec ${i} cat /proc/user_beancounters > ~/tmp.txt
     if [ `awk -f /root/checkresource.awk ~/tmp.txt|wc -l` -gt 0 ];then
        echo "Resource over-usage in CTID: ${i}"
        flag=1
     else
        echo "CTID: ${i} is GOOD"
     fi
   done

   if [ $flag -eq 0 ];then
     echo -e "\nNo failcnt reported for any of the nodes; everything is good:)\n"
   fi

}

########################
### Func: reportfailcnt
### No of arguments = 1
########################
function reportfailcnt {
    # Called only from inside function checkvps 
    # Contain only one argument and its the CTID
      echo -e "{\nif(NF==7 && match(\$1,\"[0-9]+:\") && \$NF!=0 )\nprint \$2\":\"\$NF;\nelse if(NF==6 && \$NF!=0)\nprint \$1\":\"\$NF;\n}" > /root/checkresource.awk
      
      vzctl exec $1 cat /proc/user_beancounters > ~/tmp.txt
     
    # List out the resource name and the failcnt
      awk -f /root/checkresource.awk ~/tmp.txt
}

########################
### Func: checkvps
### No of arguments = 1
########################
function checkvps {
   # Contain only one argument and its the CTID
   echo -e "{\nif(NF==7 && match(\$1,\"[0-9]+:\") && \$NF!=0 )\nprint \$2;\nelse if(NF==6 && \$NF!=0)\nprint \$1;\n}" > /root/checkresource.awk

   vzctl exec $1 cat /proc/user_beancounters > ~/tmp.txt

   if [ `awk -f /root/checkresource.awk ~/tmp.txt|wc -l` -gt 0 ];then
     echo -e "\n\nRESOURCE OVERUTILIZATION REPORTED FOR CTID: $1 \n"
     echo -e "\n++++++++++++++++++++++\nRESOURCE : FAILCNT\n+++++++++++++++++++++"
     reportfailcnt $1
     echo -e "++++++++++++++++++++++++"
     exit 1
   else
     echo -e "\nNo failure count reported for CTID: $1\n Everything is good:)\n" 
   fi

} 



########################
### Main section
########################

case "$1" in 
      checkall) 
          checkall
          ;;
      checkvps)
           # One argument required
            if [ $# -ne 2 ];then
             echo "Usage: /root/vpsresourcecheck.sh checkvps <CTID>"
             exit 1
            fi
 
           checkvps $2 
          ;;
      *)
       echo "Usage: /root/vpsresourcecheck.sh {checkvps <CTID>|checkall}"
       exit 1
esac


SAMPLE OUTPUT

[root@vz ~]# /root/vpsresourcecheck.sh checkall
CTID: 1 is GOOD
Resource over-usage in CTID: 1040
CTID: 1439 is GOOD
Resource over-usage in CTID: 4603
Resource over-usage in CTID: 5243
Resource over-usage in CTID: 5605
CTID: 5850 is GOOD
Resource over-usage in CTID: 8999
[root@vz ~]#

[root@vz ~]# /root/vpsresourcecheck.sh checkvps 8999

RESOURCE OVERUTILIZATION REPORTED FOR CTID: 8999

++++++++++++++++++++++
RESOURCE : FAILCNT
+++++++++++++++++++++
kmemsize:35
privvmpages:62
++++++++++++++++++++++++
[root@vz ~]# /root/vpsresourcecheck.sh checkvps 1

No failure count reported for CTID: 1
Everything is good:)

[root@vz ~]#