HomeGenie Forum

Development => Bug reporting => Topic started by: Jens on March 29, 2014, 11:47:35 AM

Title: R352 Raspian stops working after 2hrs
Post by: Jens on March 29, 2014, 11:47:35 AM
Hi Gene,

today I upgraded from R345 to R352 on Raspian

- stop HG
- remove the package
- clean up sudo apt-get autoremove
- update sudo apt-get update
- sudo gdebi ...

Within R352 I started from scartch with creating my config

I use X10, Hue and UpnP within Interfaces and deselect the APPs which I do not make use ogf like ZWave.

After Reboot I have HG running

~ $ sudo /etc/init.d/homegenie status
root      2303  2302  0 11:37 ?        00:00:00 sudo /usr/bin/mono HomeGenie.exe
root      2308  2303 51 11:37 ?        00:00:28 /usr/bin/mono HomeGenie.exe
[ ok ] homegenie is running.

After approx 2hrs HG stops working and is not any longer part of the processes running

The last reliable Version running for long is R345 in this setup

Please help

Thanks
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 29, 2014, 12:00:23 PM
Hi Jens,

I'm running r352 and it's still up after about a day.
My setup is almost complete: X10, Z-Wave, Philips Hue, UPnP.
What you can do to help in finding the problem is:

1) install screen (sudo apt-get install screen)
2) stop HG as a service (sudo /etc/init.d/homegenie stop)
3) open a screen session (screen)
4) run hg in the screen session (cd /usr/local/bin/homegenie; sudo mono HomeGenie.exe)

With CTRL+A, D you can detach the screen session and logout.
When it happens that HG stops running again, you can attach the HG screen session with screen -x and see in the console the error that made HG stop.
Also enable logging, there might be useful informations in the log file to better understand what happened.

Cheers,
g.

Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 29, 2014, 12:04:25 PM
Before all of that also do a:

Code: [Select]
sudo apt-get update
sudo apt-get dist-upgrade
sudo rpi-update
sudo reboot

since new kernel drivers and mono 3.2.8 are now available.

Cheers,
g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 29, 2014, 01:57:09 PM
Thanks for the tips. I turned on logging after it stopped last and will wait for the next stop.
Then I give it a stepwise approach (your second post first)

This was already the problem with R350/351 both stopped the same way, why this time I started from scratch with a new config instead of restoring the config from R345 into R352.

Regards
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: bkenobi on March 29, 2014, 07:28:46 PM
FWIW, I updated from r351 (broken) to r352 about 2 days ago.  It is still responding for me.  I am running X10 only with most programs disabled.  I have only 4 modules set up for testing and it is not set up to send any commands since I have this setup filtered off from my main PLC network.

I just moved the CM15A off the filter and will leave it there for a while to see if it becomes unstable for some reason though that was not the case in previous builds.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 11:02:45 AM
So I did perform another clean install yesterday evening
including the suggested commands in the install procedure

sudo apt-get update
sudo apt-get dist-upgrade
sudo rpi-update
sudo reboot

HG stopped shortly before midnight, these are the last entries in homegenie.log

2014-03-29T23:43:28.4959560+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0
2014-03-29T23:43:58.6911650+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0
2014-03-29T23:44:29.0925520+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0
2014-03-29T23:44:59.2447200+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0
2014-03-29T23:45:29.4372370+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0
2014-03-29T23:45:59.5872970+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0
2014-03-29T23:46:30.2473010+01:00   HomeAutomation.X10   A5   X10 Module   Meter.Watts   0


There's nothing in any log in /var/log/ around this time, HG was up for 3hrs40mins.

The light A5 has configured a value for VM Watts and is showing up like above every 30sec,
the other lights e.g. A10 which are not configured for VM Watts and still have 0 for that entry
do not show up in HG log - a hint?

Thanks

Regards
Jens


Title: Re: R352 Raspian stops working after 2hrs
Post by: badfinger on March 30, 2014, 01:09:59 PM
Hi Gene,
my HG stops faster.......it don't needs 2 hrs...
Something seems to be buggy.  I made all updates. It is a full new installation from the Raspi image.
Attached you'll find my complete log from the start to the end of a running session. I hope you'll find something.

stay tuned

Alex


Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 03:25:15 PM
It's running <in Screen> still alive, but horribly slow. the Android APP cannot access HG anymore, my guess is due to timeout. I'll wait for the complete stop.

TOP tells me MONO is using less than 5% CPU and 15% MEM.

R
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 30, 2014, 03:38:12 PM
If you're using Z-Wave, try unplugging the controller for 10 seconds and then replugging it.
Yesterday I caught into the same bug, but have no clue how to replicate it systematically.
You seem to hold the key of this =) There's something in your configuration that make the bug alive at every run.
More infos about your configurations and automation programs you're using could help.

Thanks,
g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: badfinger on March 30, 2014, 04:56:59 PM
Hi Gene,
unplugging the z-wave stick for 10 seconds bring no better results. HG crashes again. Is it possible to get a more detailed log file?

stay tuned

Alex
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 06:49:03 PM
Hi Gene

I have no Zwave at all, the corresponding APPs are deactivated within HG UI

- Setup is Raspberry B Model with CM15Pro and 8GB UltraFlashCard with expanded filesystem
- Shairport running as the only additional program (I am not streaming during these test phases)
- no additional programs besides the ones you provide
- 5 additional wizard scripts in scenes (react on forwarded InfraRed Commands)
- Use of HG_Led and DHT11
- Interfaces X10 HC A and P, UPnP
- Hue lightning with 6 bulbs (updated trigger code from 5->6) and wheatherunderground activated
- Some devices make use of the scheduler, no additional entries within the scheduler itself

maybe I can post my config file later on - already to slow right now, currently HG ist still running in Screen getting slower and slower but still alive ;-)

Entries in Screen so far

pi@raspberrypi /usr/local/bin/homegenie $ sudo mono HomeGenie.exe                                                                                           
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `location'


After a while I could enter the UI, within interfaces X10 and UpnP show now up as disabled, but the icons at the top of the webpage are still happy. Enabling/disabling has no effect anymore on the icons.


Thanks
Jens

Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 30, 2014, 07:56:45 PM
Thanks for the detailed report.
I was thinking the bug was about a problem in the SerialPortLib (used by ZWaveLib as well), but your setup is not using it at all.
I made a little change to SerialPortLib (replaced the ThreadPool with Thread with high priority, since on rpi z-wave was not preforming well, expecially in the initialization phase at startup).
Also made some relevant improvement to the UI and real-time server-to-client events.
Published all of that as r356:

https://sourceforge.net/projects/homegenie/files/testing/

this is not likely solve the bug, but we never know (damn Murphy!) =)

Cheers,
g.

Title: Re: R352 Raspian stops working after 2hrs
Post by: badfinger on March 30, 2014, 08:04:20 PM
Hi Gene,
hmmm i'm wondering. Since i deactivate my CM15pro it seems to be working.  ;D  My z-wave is running good. Is there something with the X10 libary?

stay tuned

Alex
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 30, 2014, 08:18:04 PM
I will check :) for now I'll let r356 run for a while with complete setup.
If you have a little time, upgrade to r356, I need more testers =D

g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 08:32:50 PM
I am about to install 356 on a separate raspberry.

I checked my log, 336 was the last version running for days.
Then I went to 345 which is already slowing down a little bit,
but I cannot remember a complete stop. I did not try any version
between 336 and 345.
350 got me into all the trouble as I posted already.

Cheers
Jens

Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 30, 2014, 08:36:21 PM
Hi Jens,

based on the badfinger suggestion, I made a little modification to the XTenLib.
I removed some "lock" instructions.
I just re-uploaded the r356 package, so download it again.
Let's keep finger crossed!
Next I'll try to see diffs between 336 and current release, perhaps it will help in figuring out the problem.

Cheers,
g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 08:38:13 PM
You gave us a password enabled build :-)

Can I just delete a file on Unix Level?

added later: I just removed the systemconfig.xml followed by factory reset :-))
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 30, 2014, 09:06:48 PM
What do you mean?
Did you encounter any problem installing it?

g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 09:08:45 PM
Installed!

To your question, no, installing was fine, but accessing it via http brought up the username/password dialog.

X10 seems to be very fast now, it took a while on the latest versions until the icon of the bulb went like "on" or "off" - now this is immediate.

Let's see how long it runs - maybe the story started with improving X10, there's a thread about it how long it takes to turn on/off X10 devices and changes were made - sorry just guessing for now.

The other raspberry is still running very very very slow ;-)
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 09:36:50 PM
I just accessed HG from iPad and found some noticeable difference.

From entering the IP in the browser until CONTROL gets active and selectable it takes up to 5 times longer on versions 345 and up versus 336, even directly after HG reboot.

Maybe memory leak, because the main indicator is still its getting slower and slower and the Android app times out after approx 2hrs of uptime, web browser acces still works but very slow until it stops. First thing noticeable within the UI is icons do net fresh up after actions or it takes very long.

356 seems to be improved for X10.

MIG changes of version 345?

Cheers
Jens


Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 09:50:01 PM
Now it already takes 15 times longer to access 356 than 336

352 on the other raspberry now times out on ipad but is stll accessible from pc tremendously slow.

different timeout settings in different browsers I guess
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 30, 2014, 10:38:58 PM
Some minutes ago HG 356 stopped responding to Browser request via iPAD, still very very slow via PC Browser.

R352 might take a little longer to stop completely - maybe during tonight, I keep you posted early morning.

The more I think about it I think it is the webservice itself
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 30, 2014, 11:28:59 PM
Do you keep the UI opened all the time?
This is something I didn't try yet.

g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 31, 2014, 01:02:57 AM
Hi Jens,

I don't think the problem is the webservice.
Since I was not able to really identify the problem I just tried to make some major code cleanup in XTenLib.
This new version use less cpu and has no monitors nor locks. Also got rid of some unneeded Thread.Sleeps.
I replaced the r356 deb with the new one. 

Cheers,
g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 31, 2014, 07:36:06 AM
Hi Gene,

no, the UI is not kept open. Although maybe Android keeps the connection even if in background - not 100% sure.

By accident I turned off the power yesterday evening from the 352 "screen device" - have to do it again.

Attached is my config,
I did remove the Wheatherunderground Key
and the Hue Bridge password

The 336 version which I installed yesterday evening is running like a charm and still as fast as expected.

I will now install your new version

Thanks
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 31, 2014, 08:06:04 AM
Hi Gene,

after Installation - quite fast - after a while again as slow as all the others.

But maybe I found something, because I deleted all 3 UpnP devices within my config
and HG started to be fast again.

What I noticed between 336 and 356 is the following.

The UpnP devices are autodiscovered, I add them as module within in a group called Media.
In 336 the devices show up with their Name (and ID and type) in 356 and all the other versions (I tested) later than 336, they don't have their Name (just ID and type).
In 336 the Name is automatically discovered, too.

Is HG trying to process something in regards to UpnP in the background which slows it down?

Keep you posted later today if it stays fast w/o UpnP devices.

Cheers
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 31, 2014, 02:26:38 PM
I'm now running hg with UPnP disabled.

Looking at the UPnP library source code I found out that it doesn't offer a way of stopping the service once activated, so I had to disable it from systemconfig.xml and restart hg.

What I can see now is that the cpu usage is reduced by 1/3 (3%-5% cpu and 14%-16% mem).

I'll let hg run for a couple of days, if cpu and memory usage remain stable, probably you found where the problem is.

Usually, with r35x releases, I experienced after about 12 hours an increase in cpu usage from 4%-5% to 8%-10% and also memory increase from 13%-15% to 18%-20%.

Will we spot the bug? =D
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on March 31, 2014, 08:28:21 PM
Hi Gene,

just came back and HG is dead - again. So uptime was around max 10hrs.

I did not disable UpnP in the Interfaces setup just erase the devices from the group this morning, I have done that now followed by a reboot.

In addition to what you spotted in the library please see the screenshots attached.

R336 includes "DENON" the UpnP Name of that device and R356 lacks this entry. So I would
assume that there was a code change.


Have you tried my config file?

Thanks
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jan on March 31, 2014, 09:19:55 PM
I have to correct my earlier post in the other thread...

My raspberry also went dead today with r356 for the first time.
Haven't had any problems with earlier releases, though.

No changes/configurations made after the update.
But I always had upnp disabled before and only x10 on.
After the update upnp of course was active by default. But I didn't add any devices.

Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on March 31, 2014, 10:06:17 PM
I can almost confirm that is an UPnP driver related problem. If active on startup, it will keep doing some work even if disabled later.
I have the source code of this library, but I am not going to look into this for now.
New release (that I will publish for testing tomorrow perhaps) has got upnp disabled by default, also it has lower cpu usage with an average of 5%, 3% min and 8% max, exception made for some webservice calls that will pump cpu to 90% for a couple of seconds. Also memory is stable to 15%.
It has been running for just 8 hours, so let's see tomorrow if things are still ok.


Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on April 01, 2014, 07:31:43 AM
Up to now the latest R356 is still running with upnp disabled.

Please bear in mind, that upnp has been working w/o problems in R336 - at least I did not detect any and I am pretty sure it was running for days. I can retest that in parallel if it would help you.


Let's keep fingers crossed that HG ist still alive this afternoon.

Something else in this build, occasionally the icons do no refresh, e.g. manual trigger on shutter to be up and shutter (x10) goes up but icon still shows shutter to be down. Even if you reload the page it stays down. Same happens to x10 light bulbs.


Cheers
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on April 01, 2014, 08:33:50 PM
Still up and running so upnp seems to be the guilty

cheers
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on April 01, 2014, 08:46:23 PM
Here too.. 1 day and 8 hrs up and stable.
For now I will ship hg with upnp disabled by default and an "experimental" advice in the interface settings.
With time I'll try to debug the library and fix it or look for a new upnp library to integrate in hg.

Thanks for reporting,
g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on April 01, 2014, 09:07:33 PM
Hi Gene

unfortunate I can report another bug in R356. X10 stopped now working, the x10 icon ist still highlighted, and bulbs or shutter change their status but no command is beeing sent to cm15.

Maybe something went wrong in improving x10 whilst the issue in fact was upnp.

Thanks
Jens
Title: Re: R352 Raspian stops working after 2hrs
Post by: Gene on April 01, 2014, 09:30:09 PM
Hi Jens,

in r357 I rolled back X10 and Z-Wave drivers to previous release because I also noticed strange behavior expecially with X10 missing some commands.

g.
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on April 01, 2014, 09:30:49 PM
update

after disabling and enabling x10 in interfaces all X10 commands were fired to cm15 immediately, they seem to be queued in HG
Title: Re: R352 Raspian stops working after 2hrs
Post by: Jens on April 01, 2014, 09:32:42 PM
thanks, will give it a try by tomorrow

Regards
Jens