SHM memory usage Kamailio 1.4.4

7 messages Options
Embed this post
Permalink
Robin Vleij

SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
Hi guys,

Since 1.3.0 (now running 1.4.4) I'm seeing a very slow uptake of SHM
memory on our low traffic setup (less than 5 cps per machine). I'm
looking for some basis to go further on in my research to the cause. :)

I compiled Kamailio 1.4.4-notls with #define SHM_MEM_SIZE 4*32 in
config.h in production. For my testing setup I'm running on the standard
32 there.

After about 3 weeks uptime I start top, sort on memory size and find the
kamailio processes (I'm running with 16 children) to all have about 40mb
in the SHR column. My understanding is that this should also go down,
but it only goes up, slowly. More CPS (for example a benchmark using
sipp) makes this go up faster, but it never seems to go down this
figure. I think this is wrong, but I could be wrong myself. :)

On a seperate machine with no traffic I compiled the memory debugging
according to the "memory troubleshooting" page on the wiki. LOTS of info
in the logs. Also ran with valgrind, didn't find anything interesting
(but I'm no dev myself really).

My plan now is to take away our acc module (compiled with radius
support) and see if it's maybe that module that's causing this. My test
on this traffic-less machine is as follows: start, run 20cps for a while
(we do no registers, just routing and auth) and note the SHR data from
top. Then according to my understanding this figure should drop down
after a period of 20 minutes with no traffic. Is this a right assumption?

On the test setup the top data looks like this after about 10 calls:

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15975 root      18   0 69240 1536  568 S  0.0  0.6   0:00.00 kamailio
15972 root      18   0 69240 1568  588 S  0.0  0.6   0:00.00 kamailio
15970 root      18   0 69240 1568  588 S  0.0  0.6   0:00.00 kamailio
15969 root      19   0 69244 1488  568 S  0.0  0.6   0:00.00 kamailio
15968 root      15   0 69240 1860  936 S  0.0  0.7   0:00.00 kamailio
15967 root      15   0 71436 3376 2268 S  0.0  1.3   0:00.00 kamailio
15966 root      15   0 71436 3396 2288 S  0.0  1.3   0:00.00 kamailio
15965 root      23   0 69240 5552 4640 S  0.0  2.1   0:00.02 kamailio

As far as I know, it never goes down, the SHR entries. When running with
very little SHM i config.h, the process goes out of shm memory and
complains, as expected.

Are my assumptions about all of this correct?

--
Robin Vleij
[hidden email]

_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Henning Westerholt

Re: SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)
On Freitag, 6. November 2009, Robin Vleij wrote:
> Since 1.3.0 (now running 1.4.4) I'm seeing a very slow uptake of SHM
> memory on our low traffic setup (less than 5 cps per machine). I'm
> looking for some basis to go further on in my research to the cause. :)
>
> I compiled Kamailio 1.4.4-notls with #define SHM_MEM_SIZE 4*32 in
> config.h in production. For my testing setup I'm running on the standard
> 32 there.
>
> After about 3 weeks uptime I start top, sort on memory size and find the
> kamailio processes (I'm running with 16 children) to all have about 40mb
> in the SHR column. My understanding is that this should also go down,
> but it only goes up, slowly. More CPS (for example a benchmark using
> sipp) makes this go up faster, but it never seems to go down this
> figure. I think this is wrong, but I could be wrong myself. :)
>
> On a seperate machine with no traffic I compiled the memory debugging
> according to the "memory troubleshooting" page on the wiki. LOTS of info
> in the logs. Also ran with valgrind, didn't find anything interesting
> (but I'm no dev myself really).
>
> My plan now is to take away our acc module (compiled with radius
> support) and see if it's maybe that module that's causing this. My test
> on this traffic-less machine is as follows: start, run 20cps for a while
> (we do no registers, just routing and auth) and note the SHR data from
> top. Then according to my understanding this figure should drop down
> after a period of 20 minutes with no traffic. Is this a right assumption?
> [..]
> As far as I know, it never goes down, the SHR entries. When running with
> very little SHM i config.h, the process goes out of shm memory and
> complains, as expected.
>
> Are my assumptions about all of this correct?


Hello Robin,


do you experience any problems in your setup when you use a reasonable SHM mem size? In my experience the size of the SHM memory (as displayed from top) depends on the load of the machine. But there is a certain level of shared memory that is used regardless of the load. Even if the machine has been completely passive over a longer time, it will not reclaim this memory. On a certain test system for example there is one process that has 11MB SHM at the moment, even if its completely idle.


For the VIRT column (again top) its another story, here it will just show something like SHM + PKG memory size, regardless of the actual load.


If you've a real memory leak in shared memory then after a certain time interval the server will report memory allocation errors. Otherwise i don't think its something to worry about.


Regards,


Henning


_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Robin Vleij

Re: SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
Henning Westerholt wrote:

Hi Henning!

> do you experience any problems in your setup when you use a reasonable
> SHM mem size? In my experience the size of the SHM memory (as displayed

I've had problems finding a "reasonable" shm mem size. :) Standard is
like 32MB, which runs out quickly when customers do "funny stuff" (read:
loops). Now I'm compiling with #define SHM_MEM_SIZE 4*32. 128MB should
be enough to hold pretty long. So there's no immediate memory problem or
crashes (when it's full, it gets errors and stops processing traffic the
right way). But right now for example, after a "funny" customer, I'm
seeing over 40mb per child in top (16 children). That won't go down
anymore, so we'll have to see how long it holds.
What do you suggest for SHM sizes?

> machine has been completely passive over a longer time, it will not
> reclaim this memory. On a certain test system for example there is one
> process that has 11MB SHM at the moment, even if its completely idle.

OK. We often run very long on 10-20MB per process (all processes have
about the same, at least the children that process UDP), but like today
when someone has a problem and it becomes sip-spaghetti it jumps up to
40MB and then continues to slowly rise from there. Doesn't feel good to
be able to hit some kind of roof with the same traffic load.

> For the VIRT column (again top) its another story, here it will just
> show something like SHM + PKG memory size, regardless of the actual load.

Virt shows 421MB right now for me. I figured out that's what you write,
the PKG memory of each process + the SHM.

> If you've a real memory leak in shared memory then after a certain time
> interval the server will report memory allocation errors. Otherwise i
> don't think its something to worry about.

It does, if I don't make the limit higher. So say that I'm running on
32, then if I would hit that after some weeks uptime it would start
reporting memory allocation errors in different parts of my config and
stop doing important stuff. I also reproduced this assigning a small
amount to a dev machine and then sending 20cps to the machine.

On a test machine I have like 4 processes all using 600kb or so, then
after 20 calls it'll go up to something like

31409 root      15   0 94672 1936 1052 R  0.0  0.7   0:00.00 kamailio
31408 root      15   0 94784 3072 2068 S  0.0  1.2   0:00.00 kamailio
31407 root      15   0 94784 3072 2068 S  0.0  1.2   0:00.00 kamailio
31406 root      25   0 94672 5428 4556 S  0.0  2.1   0:00.02 kamailio

And go back in size only a little after 15-20 minutes or so (often a bit
faster is load is low).

If this is a leak, it'll be almost impossible to find. I can't run
production with memlog or debug on, and in dev it's quite hard to
reproduce it seems. Not sure what to expect. :)

--
Robin Vleij
[hidden email]

_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Henning Westerholt

Re: SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)
On Montag, 9. November 2009, Robin Vleij wrote:
> > do you experience any problems in your setup when you use a reasonable
> > SHM mem size? In my experience the size of the SHM memory (as displayed
>
> I've had problems finding a "reasonable" shm mem size. :) Standard is
> like 32MB, which runs out quickly when customers do "funny stuff" (read:
> loops). Now I'm compiling with #define SHM_MEM_SIZE 4*32. 128MB should
> be enough to hold pretty long.


Hi Robin,


btw, there is no need to re-compile the server just to change this setting, its a normal daemon binary parameter. 128 MB should be really fine, given the load you quoted.


> So there's no immediate memory problem or
> crashes (when it's full, it gets errors and stops processing traffic the
> right way). But right now for example, after a "funny" customer, I'm
> seeing over 40mb per child in top (16 children). That won't go down
> anymore, so we'll have to see how long it holds.
> What do you suggest for SHM sizes?


With today memory sizes/ prizes you could use for example 512 MB, which should give you plenty of room even in really abnormal load conditions. And as its shared, you'll have still plenty of room for e.g. the database.


> > machine has been completely passive over a longer time, it will not
> > reclaim this memory. On a certain test system for example there is one
> > process that has 11MB SHM at the moment, even if its completely idle.
>
> OK. We often run very long on 10-20MB per process (all processes have
> about the same, at least the children that process UDP), but like today
> when someone has a problem and it becomes sip-spaghetti it jumps up to
> 40MB and then continues to slowly rise from there. Doesn't feel good to
> be able to hit some kind of roof with the same traffic load.


You mentioned the the loops a few times, normally they should be pretty fast detected by max forward counter checks and additionally by diversion header checks?


> > If you've a real memory leak in shared memory then after a certain time
> > interval the server will report memory allocation errors. Otherwise i
> > don't think its something to worry about.
>
> It does, if I don't make the limit higher. So say that I'm running on
> 32, then if I would hit that after some weeks uptime it would start
> reporting memory allocation errors in different parts of my config and
> stop doing important stuff. I also reproduced this assigning a small
> amount to a dev machine and then sending 20cps to the machine.
>
> On a test machine I have like 4 processes all using 600kb or so, then
> after 20 calls it'll go up to something like
>
> 31409 root 15 0 94672 1936 1052 R 0.0 0.7 0:00.00 kamailio
> 31408 root 15 0 94784 3072 2068 S 0.0 1.2 0:00.00 kamailio
> 31407 root 15 0 94784 3072 2068 S 0.0 1.2 0:00.00 kamailio
> 31406 root 25 0 94672 5428 4556 S 0.0 2.1 0:00.02 kamailio
>
> And go back in size only a little after 15-20 minutes or so (often a bit
> faster is load is low).


With the memory debugging you could dump all the allocations during runtime, but they are a bit hard to read for a non-developer. But this way you could reproduce "call by call" how your server behave and how the situation develops.


> If this is a leak, it'll be almost impossible to find. I can't run
> production with memlog or debug on, and in dev it's quite hard to
> reproduce it seems. Not sure what to expect. :)


If you'd have a leak in a common used code path, then you'll run out of memory pretty fast, like in a few days. If your servers are stable (like some weeks or month) with the setting you use at the moment, i don't think there is much to worry.


Regards,


Henning


_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Robin Vleij

Re: SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
Henning Westerholt wrote:

Hi Henning!

> btw, there is no need to re-compile the server just to change this
> setting, its a normal daemon binary parameter. 128 MB should be really
> fine, given the load you quoted.

Ah, OK. Missed that. I had already defined the -m as 256, wasn't aware
that that was shared. So I'll let it run and see how it goes and if it
will keep growing till it reaches 256.

> With today memory sizes/ prizes you could use for example 512 MB, which
> should give you plenty of room even in really abnormal load conditions.
> And as its shared, you'll have still plenty of room for e.g. the database.

Yep. It's basically to just use a huge amount and then normally you
wouldn't hit it. I understand from all of this that the memory "growing"
is really something that is not specific to my setup here.

> You mentioned the the loops a few times, normally they should be pretty
> fast detected by max forward counter checks and additionally by
> diversion header checks?

Well, it's more like this. A customer sends an invite, which is really
to himself (failboat). So I send it to him (directly or via a PSTN
gateway, depending on the routing setup). Which causes (for example when
their PBX has a forwarding) a new invite to me, new call leg. Untill one
of both sides dies or is congested. That's normally not Kamailio, so
that's the good news. Only thing then is the memory usage after this
spike. I'm also running spike, so in the end I just send them 480's back.

> With the memory debugging you could dump all the allocations during
> runtime, but they are a bit hard to read for a non-developer. But this
> way you could reproduce "call by call" how your server behave and how
> the situation develops.

I had this on and it was a LOT of info. :) I made some calls and it
showed that there's lots of stats and init stuff, for the rest nothing
shocking. But as you said, I'm not a dev so I might have missed shocking
errors. :)

> If you'd have a leak in a common used code path, then you'll run out of
> memory pretty fast, like in a few days. If your servers are stable (like
> some weeks or month) with the setting you use at the moment, i don't
> think there is much to worry.

OK, I'll keep an eye on it. Will run with 128MB for now and see how it
grows with the load. I was looking at a 1.4.4 -> 1.5.0 upgrade, but that
was a bit more complicated than 1.3 -> 1.4 because of the database
layout and some changed modules. Have to write a failback plan before I
upgrade. Might also wait for 3.0.0, which sounds interesting.

/Robin

--
Robin Vleij
[hidden email]

_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Henning Westerholt

Re: SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)
On Dienstag, 10. November 2009, Robin Vleij wrote:
> > With today memory sizes/ prizes you could use for example 512 MB, which
> > should give you plenty of room even in really abnormal load conditions.
> > And as its shared, you'll have still plenty of room for e.g. the
> > database.
>
> Yep. It's basically to just use a huge amount and then normally you
> wouldn't hit it. I understand from all of this that the memory "growing"
> is really something that is not specific to my setup here.


Hey Robin,


yes, and normally only a small fraction of the memory is used.


> > You mentioned the the loops a few times, normally they should be pretty
> > fast detected by max forward counter checks and additionally by
> > diversion header checks?
>
> Well, it's more like this. A customer sends an invite, which is really
> to himself (failboat). So I send it to him (directly or via a PSTN
> gateway, depending on the routing setup). Which causes (for example when
> their PBX has a forwarding) a new invite to me, new call leg. Untill one
> of both sides dies or is congested. That's normally not Kamailio, so
> that's the good news. Only thing then is the memory usage after this
> spike. I'm also running spike, so in the end I just send them 480's back.


Ok, i understand. So its more a temporary over load condition that you face.


> > With the memory debugging you could dump all the allocations during
> > runtime, but they are a bit hard to read for a non-developer. But this
> > way you could reproduce "call by call" how your server behave and how
> > the situation develops.
>
> I had this on and it was a LOT of info. :) I made some calls and it
> showed that there's lots of stats and init stuff, for the rest nothing
> shocking. But as you said, I'm not a dev so I might have missed shocking
> errors. :)


Yes, its a lot of information to parse. But if you do a mem dump (as described e.g. here: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory) with enabled memory debugging you could see where the allocations come from.


> > If you'd have a leak in a common used code path, then you'll run out of
> > memory pretty fast, like in a few days. If your servers are stable (like
> > some weeks or month) with the setting you use at the moment, i don't
> > think there is much to worry.
>
> OK, I'll keep an eye on it. Will run with 128MB for now and see how it
> grows with the load. I was looking at a 1.4.4 -> 1.5.0 upgrade, but that
> was a bit more complicated than 1.3 -> 1.4 because of the database
> layout and some changed modules. Have to write a failback plan before I
> upgrade. Might also wait for 3.0.0, which sounds interesting.


We did an upgrade to 1.5 in the last months on some of our production systems, without any notable problems. Some other systems we've needs some more time before they can run on 1.5, especially because of the database changes you also mentioned. If you update make sure that you use the latest 1.5 version/ stable branch. 3.0 will be indeed interesting, looking forward to this.


Regards,


Henning


_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Robin Vleij

Re: SHM memory usage Kamailio 1.4.4

Reply Threaded More More options
Print post
Permalink
On 11/10/09 12:26 PM, Henning Westerholt wrote:

Hi Henning!

>> Well, it's more like this. A customer sends an invite, which is really
>> to himself (failboat). So I send it to him (directly or via a PSTN
>> gateway, depending on the routing setup). Which causes (for example when
>> their PBX has a forwarding) a new invite to me, new call leg. Untill one
>> of both sides dies or is congested. That's normally not Kamailio, so
>> that's the good news. Only thing then is the memory usage after this
>> spike. I'm also running spike, so in the end I just send them 480's back.
>
> Ok, i understand. So its more a temporary over load condition that you face.

Yes, exactly. And it's not really overload, the machine can easily
handle it, it's not loaded at all, even if customers do "interesting"
stuff. But it's just that the memory grows and never seems to shrink again.

> We did an upgrade to 1.5 in the last months on some of our production
> systems, without any notable problems. Some other systems we've needs
> some more time before they can run on 1.5, especially because of the
> database changes you also mentioned. If you update make sure that you
> use the latest 1.5 version/ stable branch. 3.0 will be indeed
> interesting, looking forward to this.

Me too. I think I'll actually wait for 3.0.x or 3.1 and then make one
big upgrade. Night maintenance is not my favourite hobby so I'll make a
big step. I think the current 1.4.4 I'm running doesn't have any acute
problems or crashes so it's fine. :)

Thanks for your help and explanations (and ofcourse your work on Kamailio)!

/Robin

_______________________________________________
sr-dev mailing list
[hidden email]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev