[Chilli] 100% cpu problem

David Bird david at coova.com
Thu Oct 7 15:48:49 UTC 2010


I've also been fixing some issues related to DNS that could cause
crashing and could be an issue if someone were, for example, tunneling
traffic in DNS packets past the captive portal. See the subversion
trunk. 

On Thu, 2010-10-07 at 11:35 +0200, Marco Simioni wrote:
> I'll let you know.
> 
> I printed to log both offset and len.
> 
> Let's see what happens.
> 
> Regards.
> 
> 2010/10/7 David Bird <david at coova.com>:
> > In getnextattr offset should always increment, and both offset and len
> > are unsigned. Though, to be safe (to protect against a bad radius
> > packet), we could change:
> >
> > -    if (t->t == 0)
> > +    if (t->t == 0 || t->l < 2)
> >       return -1;
> >
> > On Thu, 2010-10-07 at 10:54 +0200, Marco Simioni wrote:
> >> Thank you alberto for your answer.
> >>
> >> I tried to debug another time with gdb, and i saw it was into the
> >> "rad_getnextattr" again.
> >>
> >> I'm now trying to look into the "rad_getnextattr()" function.
> >>
> >> I see there is a while loop, that could turn into potential endless
> >> loop: "while (offset < len)".
> >>
> >> I inserted some debug messages, let's see the next time it happens
> >>
> >> Then, i'll try your suggestions.
> >>
> >> Thank you.
> >>
> >> Regards,
> >>
> >> Marco
> >>
> >> 2010/10/6 Alberto Bellettato <albesvs at yahoo.it>:
> >> > Have you tried to temporarily remove the "/etc/rc2.d/S20chilli radconfig"
> >> > script by your cron table? It could help isolating the problem.
> >> > Then have you enabled ssl or redir in your chilli config?
> >> >
> >> > ----- Original Message ----- From: "Marco Simioni" <m.simioni at gmail.com>
> >> > To: <chilli at coova.org>
> >> > Sent: Wednesday, October 06, 2010 11:26 AM
> >> > Subject: Re: [Chilli] 100% cpu problem
> >> >
> >> >
> >> > No news.
> >> >
> >> > Stil happened.
> >> >
> >> > I'm using latest SVN version.
> >> >
> >> > How can i investigate the problem ?
> >> >
> >> > 2010/9/23 Marco Simioni <m.simioni at gmail.com>:
> >> >>
> >> >> It happened again: i had 100% cpu after days of uptime.
> >> >>
> >> >> This time, i was able to identify the following:
> >> >>
> >> >> 1) Thanks to "sar" utility, i was able to localize the CPU usage
> >> >> between 09:45 and 10:05.
> >> >>
> >> >> 09:05:02 CPU %user %nice %system %iowait %steal %idle
> >> >> 09:45:02 all 0,00 0,00 0,04 4,26 0,00 95,70
> >> >> 09:55:02 all 2,51 0,00 0,08 3,47 0,00 93,94
> >> >> 10:05:02 all 98,32 0,00 1,68 0,00 0,00 0,00
> >> >>
> >> >> 2) In syslog, the messages i see are:
> >> >>
> >> >> Sep 23 09:51:10 izc coova-chilli[939]: chilli.c: 3402: DHCP addr
> >> >> released by MAC=90-84-0D-D2-00-2D IP=0.0.0.0
> >> >> Sep 23 09:51:11 izc coova-chilli[939]: chilli.c: 3248: New DHCP
> >> >> request from MAC=90-84-0D-D2-00-2D
> >> >> Sep 23 09:52:02 izc coova-chilli[939]: chilli.c: 3248: New DHCP
> >> >> request from MAC=00-0E-6A-7A-AB-9C
> >> >> Sep 23 09:52:02 izc coova-chilli[939]: chilli.c: 3209: Client
> >> >> MAC=00-0E-6A-7A-AB-9C assigned IP 10.1.0.73
> >> >> Sep 23 09:55:02 izc CRON[9452]: (root) CMD (command -v debian-sa1 >
> >> >> /dev/null && debian-sa1 1 1)
> >> >> Sep 23 10:00:03 izc CRON[9455]: (root) CMD (/etc/rc2.d/S20chilli
> >> >> radconfig)
> >> >> Sep 23 10:05:01 izc CRON[9461]: (root) CMD (command -v debian-sa1 >
> >> >> /dev/null && debian-sa1 1 1)
> >> >>
> >> >> 3) I tried to attach with "gdb" to the chilli process. Nothing
> >> >> happened. I didn't know what to do, so tried to make a step and i got
> >> >> the following:
> >> >>
> >> >> (gdb) step
> >> >> Single stepping until exit from function radius_getnextattr, wich has
> >> >> no line number information.
> >> >>
> >> >> than nothing else.
> >> >>
> >> >> 4) Tried to attach with "strace"
> >> >>
> >> >> root at izc:# strace -p 939
> >> >> Process 939 attached - interrupt to quit
> >> >>
> >> >> and nothing happened.
> >> >>
> >> >> Now i had to reboot to let customers surf.
> >> >>
> >> >> What can i do next time?
> >> >>
> >> >> The syslog "radconf" and the gdb message "radius_getnextattr" could
> >> >> point to something ?
> >> >>
> >> >> Keep in mind that the radius server is a proprietary one, it is not
> >> >> freeradius or something else.
> >> >>
> >> >> Is there something i can to the next time with gdb and/or strace ?
> >> >>
> >> >> Thank you again.
> >> >>
> >> >> 2010/7/27 David Bird <david at coova.com>:
> >> >>>
> >> >>> Wichert asked a good question; are you using SSL features of
> >> >>> CoovaChilli?
> >> >>>
> >> >>> For info on gdb, you can google for it, of course. Here is a quick howto
> >> >>> page:
> >> >>> http://www.freebsd.org/doc/en/books/developers-handbook/debugging.html
> >> >>>
> >> >>> For strace, use "strace -p <pid>" and you will see the system calls
> >> >>> being executed - if it is using 100%, there must be a runaway loop
> >> >>> occurring.
> >> >>>
> >> >>> David
> >> >>>
> >> >>> On Tue, 2010-07-27 at 09:20 +0200, Marco Simioni wrote:
> >> >>>>
> >> >>>> It happened 3 times in a month,
> >> >>>>
> >> >>>> not immediately but after some day of regular work.
> >> >>>>
> >> >>>> "top" command sayd chilli was consuming 100% CPU.
> >> >>>>
> >> >>>> customers reported that they could not get dhcp and then login page.
> >> >>>>
> >> >>>> i will try to use chilli_query when will happen again.
> >> >>>>
> >> >>>> how can i attach with gdb or strace? can you point me some
> >> >>>> documentation?
> >> >>>>
> >> >>>> i can run it in debug mode only in console mode, with -fd, or also as
> >> >>>> a service ?
> >> >>>>
> >> >>>> thank you i.a.
> >> >>>>
> >> >>>> regards,
> >> >>>>
> >> >>>> Marco
> >> >>>>
> >> >>>> 2010/7/27 David Bird <david at coova.com>:
> >> >>>> > How quickly does this start to happen? Immediately? After how long?
> >> >>>> >
> >> >>>> > Is chilli also not working during this time? Does chilli_query hang?
> >> >>>> >
> >> >>>> > Are you able to attach gdb or use strace to get more info?
> >> >>>> >
> >> >>>> > If able, you can try running in debug mode for additional log
> >> >>>> > information?
> >> >>>> >
> >> >>>> > Thanks,
> >> >>>> > David
> >> >>>> >
> >> >>>> >
> >> >>>> > On Tue, 2010-07-27 at 08:55 +0200, Marco Simioni wrote:
> >> >>>> >> Hi all, my customer is reporting a cpu problem.
> >> >>>> >>
> >> >>>> >> Chilli goes to consume all the processor, going to 100%.
> >> >>>> >>
> >> >>>> >> It is a brand new setup, bult on a VMWare ESXi Virtual Machine on HP
> >> >>>> >> >> ML115,
> >> >>>> >> 1.8GHz CPU allocated,
> >> >>>> >> 512MB RAM allocated,
> >> >>>> >> coova-chilli 1.2.2,
> >> >>>> >> Ubuntu 9.10 ( 2.6.31-14-server ).
> >> >>>> >>
> >> >>>> >> It's about three times it happens, solved it with a reboot.
> >> >>>> >>
> >> >>>> >> Very few clients, < 10.
> >> >>>> >>
> >> >>>> >> Suggestions ?
> >> >>>> >>
> >> >>>> >> How can i investigate and understand when it happens ?
> >> >>>> >>
> >> >>>> >> Thank i.a.
> >> >>>> >>
> >> >>>> >> Best regards,
> >> >>>> >>
> >> >>>> >> Marco
> >> >>>> >> _______________________________________________
> >> >>>> >> Chilli mailing list
> >> >>>> >> Chilli at coova.org
> >> >>>> >> http://lists.coova.org/cgi-bin/mailman/listinfo/chilli
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> > _______________________________________________
> >> > Chilli mailing list
> >> > Chilli at coova.org
> >> > http://lists.coova.org/cgi-bin/mailman/listinfo/chilli
> >> > _______________________________________________
> >> > Chilli mailing list
> >> > Chilli at coova.org
> >> > http://lists.coova.org/cgi-bin/mailman/listinfo/chilli
> >> >
> >> _______________________________________________
> >> Chilli mailing list
> >> Chilli at coova.org
> >> http://lists.coova.org/cgi-bin/mailman/listinfo/chilli
> >
> >
> >




More information about the Chilli mailing list