HomeGenie Forum
Development => Bug reporting => Topic started by: Jens on March 29, 2014, 11:47:35 AM
-
Hi Gene,
today I upgraded from R345 to R352 on Raspian
- stop HG
- remove the package
- clean up sudo apt-get autoremove
- update sudo apt-get update
- sudo gdebi ...
Within R352 I started from scartch with creating my config
I use X10, Hue and UpnP within Interfaces and deselect the APPs which I do not make use ogf like ZWave.
After Reboot I have HG running
~ $ sudo /etc/init.d/homegenie status
root 2303 2302 0 11:37 ? 00:00:00 sudo /usr/bin/mono HomeGenie.exe
root 2308 2303 51 11:37 ? 00:00:28 /usr/bin/mono HomeGenie.exe
[ ok ] homegenie is running.
After approx 2hrs HG stops working and is not any longer part of the processes running
The last reliable Version running for long is R345 in this setup
Please help
Thanks
Jens
-
Hi Jens,
I'm running r352 and it's still up after about a day.
My setup is almost complete: X10, Z-Wave, Philips Hue, UPnP.
What you can do to help in finding the problem is:
1) install screen (sudo apt-get install screen)
2) stop HG as a service (sudo /etc/init.d/homegenie stop)
3) open a screen session (screen)
4) run hg in the screen session (cd /usr/local/bin/homegenie; sudo mono HomeGenie.exe)
With CTRL+A, D you can detach the screen session and logout.
When it happens that HG stops running again, you can attach the HG screen session with screen -x and see in the console the error that made HG stop.
Also enable logging, there might be useful informations in the log file to better understand what happened.
Cheers,
g.
-
Before all of that also do a:
sudo apt-get update
sudo apt-get dist-upgrade
sudo rpi-update
sudo reboot
since new kernel drivers and mono 3.2.8 are now available.
Cheers,
g.
-
Thanks for the tips. I turned on logging after it stopped last and will wait for the next stop.
Then I give it a stepwise approach (your second post first)
This was already the problem with R350/351 both stopped the same way, why this time I started from scratch with a new config instead of restoring the config from R345 into R352.
Regards
Jens
-
FWIW, I updated from r351 (broken) to r352 about 2 days ago. It is still responding for me. I am running X10 only with most programs disabled. I have only 4 modules set up for testing and it is not set up to send any commands since I have this setup filtered off from my main PLC network.
I just moved the CM15A off the filter and will leave it there for a while to see if it becomes unstable for some reason though that was not the case in previous builds.
-
So I did perform another clean install yesterday evening
including the suggested commands in the install procedure
sudo apt-get update
sudo apt-get dist-upgrade
sudo rpi-update
sudo reboot
HG stopped shortly before midnight, these are the last entries in homegenie.log
2014-03-29T23:43:28.4959560+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
2014-03-29T23:43:58.6911650+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
2014-03-29T23:44:29.0925520+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
2014-03-29T23:44:59.2447200+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
2014-03-29T23:45:29.4372370+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
2014-03-29T23:45:59.5872970+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
2014-03-29T23:46:30.2473010+01:00 HomeAutomation.X10 A5 X10 Module Meter.Watts 0
There's nothing in any log in /var/log/ around this time, HG was up for 3hrs40mins.
The light A5 has configured a value for VM Watts and is showing up like above every 30sec,
the other lights e.g. A10 which are not configured for VM Watts and still have 0 for that entry
do not show up in HG log - a hint?
Thanks
Regards
Jens
-
Hi Gene,
my HG stops faster.......it don't needs 2 hrs...
Something seems to be buggy. I made all updates. It is a full new installation from the Raspi image.
Attached you'll find my complete log from the start to the end of a running session. I hope you'll find something.
stay tuned
Alex
-
It's running <in Screen> still alive, but horribly slow. the Android APP cannot access HG anymore, my guess is due to timeout. I'll wait for the complete stop.
TOP tells me MONO is using less than 5% CPU and 15% MEM.
R
Jens
-
If you're using Z-Wave, try unplugging the controller for 10 seconds and then replugging it.
Yesterday I caught into the same bug, but have no clue how to replicate it systematically.
You seem to hold the key of this =) There's something in your configuration that make the bug alive at every run.
More infos about your configurations and automation programs you're using could help.
Thanks,
g.
-
Hi Gene,
unplugging the z-wave stick for 10 seconds bring no better results. HG crashes again. Is it possible to get a more detailed log file?
stay tuned
Alex
-
Hi Gene
I have no Zwave at all, the corresponding APPs are deactivated within HG UI
- Setup is Raspberry B Model with CM15Pro and 8GB UltraFlashCard with expanded filesystem
- Shairport running as the only additional program (I am not streaming during these test phases)
- no additional programs besides the ones you provide
- 5 additional wizard scripts in scenes (react on forwarded InfraRed Commands)
- Use of HG_Led and DHT11
- Interfaces X10 HC A and P, UPnP
- Hue lightning with 6 bulbs (updated trigger code from 5->6) and wheatherunderground activated
- Some devices make use of the scheduler, no additional entries within the scheduler itself
maybe I can post my config file later on - already to slow right now, currently HG ist still running in Screen getting slower and slower but still alive ;-)
Entries in Screen so far
pi@raspberrypi /usr/local/bin/homegenie $ sudo mono HomeGenie.exe
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `moon_phase'
`string' does not contain a definition for `location'
After a while I could enter the UI, within interfaces X10 and UpnP show now up as disabled, but the icons at the top of the webpage are still happy. Enabling/disabling has no effect anymore on the icons.
Thanks
Jens
-
Thanks for the detailed report.
I was thinking the bug was about a problem in the SerialPortLib (used by ZWaveLib as well), but your setup is not using it at all.
I made a little change to SerialPortLib (replaced the ThreadPool with Thread with high priority, since on rpi z-wave was not preforming well, expecially in the initialization phase at startup).
Also made some relevant improvement to the UI and real-time server-to-client events.
Published all of that as r356:
https://sourceforge.net/projects/homegenie/files/testing/
this is not likely solve the bug, but we never know (damn Murphy!) =)
Cheers,
g.
-
Hi Gene,
hmmm i'm wondering. Since i deactivate my CM15pro it seems to be working. ;D My z-wave is running good. Is there something with the X10 libary?
stay tuned
Alex
-
I will check :) for now I'll let r356 run for a while with complete setup.
If you have a little time, upgrade to r356, I need more testers =D
g.
-
I am about to install 356 on a separate raspberry.
I checked my log, 336 was the last version running for days.
Then I went to 345 which is already slowing down a little bit,
but I cannot remember a complete stop. I did not try any version
between 336 and 345.
350 got me into all the trouble as I posted already.
Cheers
Jens
-
Hi Jens,
based on the badfinger suggestion, I made a little modification to the XTenLib.
I removed some "lock" instructions.
I just re-uploaded the r356 package, so download it again.
Let's keep finger crossed!
Next I'll try to see diffs between 336 and current release, perhaps it will help in figuring out the problem.
Cheers,
g.
-
You gave us a password enabled build :-)
Can I just delete a file on Unix Level?
added later: I just removed the systemconfig.xml followed by factory reset :-))
-
What do you mean?
Did you encounter any problem installing it?
g.
-
Installed!
To your question, no, installing was fine, but accessing it via http brought up the username/password dialog.
X10 seems to be very fast now, it took a while on the latest versions until the icon of the bulb went like "on" or "off" - now this is immediate.
Let's see how long it runs - maybe the story started with improving X10, there's a thread about it how long it takes to turn on/off X10 devices and changes were made - sorry just guessing for now.
The other raspberry is still running very very very slow ;-)
-
I just accessed HG from iPad and found some noticeable difference.
From entering the IP in the browser until CONTROL gets active and selectable it takes up to 5 times longer on versions 345 and up versus 336, even directly after HG reboot.
Maybe memory leak, because the main indicator is still its getting slower and slower and the Android app times out after approx 2hrs of uptime, web browser acces still works but very slow until it stops. First thing noticeable within the UI is icons do net fresh up after actions or it takes very long.
356 seems to be improved for X10.
MIG changes of version 345?
Cheers
Jens
-
Now it already takes 15 times longer to access 356 than 336
352 on the other raspberry now times out on ipad but is stll accessible from pc tremendously slow.
different timeout settings in different browsers I guess
-
Some minutes ago HG 356 stopped responding to Browser request via iPAD, still very very slow via PC Browser.
R352 might take a little longer to stop completely - maybe during tonight, I keep you posted early morning.
The more I think about it I think it is the webservice itself
-
Do you keep the UI opened all the time?
This is something I didn't try yet.
g.
-
Hi Jens,
I don't think the problem is the webservice.
Since I was not able to really identify the problem I just tried to make some major code cleanup in XTenLib.
This new version use less cpu and has no monitors nor locks. Also got rid of some unneeded Thread.Sleeps.
I replaced the r356 deb with the new one.
Cheers,
g.
-
Hi Gene,
no, the UI is not kept open. Although maybe Android keeps the connection even if in background - not 100% sure.
By accident I turned off the power yesterday evening from the 352 "screen device" - have to do it again.
Attached is my config,
I did remove the Wheatherunderground Key
and the Hue Bridge password
The 336 version which I installed yesterday evening is running like a charm and still as fast as expected.
I will now install your new version
Thanks
Jens
-
Hi Gene,
after Installation - quite fast - after a while again as slow as all the others.
But maybe I found something, because I deleted all 3 UpnP devices within my config
and HG started to be fast again.
What I noticed between 336 and 356 is the following.
The UpnP devices are autodiscovered, I add them as module within in a group called Media.
In 336 the devices show up with their Name (and ID and type) in 356 and all the other versions (I tested) later than 336, they don't have their Name (just ID and type).
In 336 the Name is automatically discovered, too.
Is HG trying to process something in regards to UpnP in the background which slows it down?
Keep you posted later today if it stays fast w/o UpnP devices.
Cheers
Jens
-
I'm now running hg with UPnP disabled.
Looking at the UPnP library source code I found out that it doesn't offer a way of stopping the service once activated, so I had to disable it from systemconfig.xml and restart hg.
What I can see now is that the cpu usage is reduced by 1/3 (3%-5% cpu and 14%-16% mem).
I'll let hg run for a couple of days, if cpu and memory usage remain stable, probably you found where the problem is.
Usually, with r35x releases, I experienced after about 12 hours an increase in cpu usage from 4%-5% to 8%-10% and also memory increase from 13%-15% to 18%-20%.
Will we spot the bug? =D
-
Hi Gene,
just came back and HG is dead - again. So uptime was around max 10hrs.
I did not disable UpnP in the Interfaces setup just erase the devices from the group this morning, I have done that now followed by a reboot.
In addition to what you spotted in the library please see the screenshots attached.
R336 includes "DENON" the UpnP Name of that device and R356 lacks this entry. So I would
assume that there was a code change.
Have you tried my config file?
Thanks
Jens
-
I have to correct my earlier post in the other thread...
My raspberry also went dead today with r356 for the first time.
Haven't had any problems with earlier releases, though.
No changes/configurations made after the update.
But I always had upnp disabled before and only x10 on.
After the update upnp of course was active by default. But I didn't add any devices.
-
I can almost confirm that is an UPnP driver related problem. If active on startup, it will keep doing some work even if disabled later.
I have the source code of this library, but I am not going to look into this for now.
New release (that I will publish for testing tomorrow perhaps) has got upnp disabled by default, also it has lower cpu usage with an average of 5%, 3% min and 8% max, exception made for some webservice calls that will pump cpu to 90% for a couple of seconds. Also memory is stable to 15%.
It has been running for just 8 hours, so let's see tomorrow if things are still ok.
-
Up to now the latest R356 is still running with upnp disabled.
Please bear in mind, that upnp has been working w/o problems in R336 - at least I did not detect any and I am pretty sure it was running for days. I can retest that in parallel if it would help you.
Let's keep fingers crossed that HG ist still alive this afternoon.
Something else in this build, occasionally the icons do no refresh, e.g. manual trigger on shutter to be up and shutter (x10) goes up but icon still shows shutter to be down. Even if you reload the page it stays down. Same happens to x10 light bulbs.
Cheers
Jens
-
Still up and running so upnp seems to be the guilty
cheers
Jens
-
Here too.. 1 day and 8 hrs up and stable.
For now I will ship hg with upnp disabled by default and an "experimental" advice in the interface settings.
With time I'll try to debug the library and fix it or look for a new upnp library to integrate in hg.
Thanks for reporting,
g.
-
Hi Gene
unfortunate I can report another bug in R356. X10 stopped now working, the x10 icon ist still highlighted, and bulbs or shutter change their status but no command is beeing sent to cm15.
Maybe something went wrong in improving x10 whilst the issue in fact was upnp.
Thanks
Jens
-
Hi Jens,
in r357 I rolled back X10 and Z-Wave drivers to previous release because I also noticed strange behavior expecially with X10 missing some commands.
g.
-
update
after disabling and enabling x10 in interfaces all X10 commands were fired to cm15 immediately, they seem to be queued in HG
-
thanks, will give it a try by tomorrow
Regards
Jens