Lights become unresponsive

Hi, I’m working on a Python-based web app at home, using LifxLAN with Flask as the web server. I’ve noticed that after a while, the bulbs don’t respond to various messages.

For example, hitting a URL on my app requests power-on status from all of the bulbs, and turns on each one that’s off. It then sets some colors. However, if the app’s been sitting idle, it doesn’t get past the effort to query the power status, as some of the bulbs don’t respond to the message.

I also run a background thread, where the app re-discovers all of the lights every 2 minutes. Within that thread, I see errors with bulbs not responding to the State Label message.

Is there something I’m supposed to do to wake the bulbs up? I did notice that if I send messages to them to all power off, and then power on again, they start to work ok. Also, as long as I keep using the app, they respond fine. It’s after a period of idle time, seems like more than 10 minutes or so, that the bulbs become unresponsive, even if they’re on.

I also noticed that the Android app never has that problem. Does anyone know what that app does to stay in communication with the bulbs?

I haven’t observed the same behaviour here, using the same library, but then I tend to only use it to update, rather than requesting status first. Can you post a snippet that you’re seeing repeated failures with?

Also, it sounds like you want your command to do the same thing to all bulbs. What’s the reason for only sending “powerOn” to the ones that are off? It’s a no-op if the bulb is already on.

Funnily enough, for an entirely different project I started taking pcaps off of a router, and have seen what the Android app sends to the lights over the LAN, at least on startup. There’s a lot of redundancy. On startup the app runs discovery and collects state information several times. Presumably that helps mitigate failed responses. I didn’t do a pcap for long, so I didn’t see whether there was some kind of keep-alive interaction happening over time as well.

If you’re always going to turn on all the lights, I do recommending turning on all lights with a broadcast message (using the LifxLAN() object). Those broadcast messages are very reliable. Who knows, maybe they’ll even “wake up” the lights for unicast messages. That shouldn’t be necessary, however…

When seeing weird connection issues with these bulbs, I have found that physically powering them off, then powering them back on works an astonishing amount of the time.

If that doesn’t work, something to try in lifxlan is to increase the number of attempts from the default (one attempt, see DEFAULT_ATTEMPTS in to multiple attempts. In a place where normally you would write:

    power = mydevice.get_power()
except WorkflowException as e:

Instead try:

    response = mydevice.req_with_resp(GetPower, StatePower, max_attempts=5)
    power = response.power_level
except WorkflowException as e:

Or, you can manually implement multiple retries yourself:

power = None
num_attempts = 0
while (num_attempts < 5 and power == None)
        power = mydevice.get_power()
    except WorkflowException:
        num_attempts += 1

(Didn’t test that code, good luck.)

It’ll take five times as long to timeout if it’s going to timeout, but at least you’ll be able to see if these issues are just due to the network dropping packets more often than one would like, or whether there is something else at play.

If the above code works, that suggests that the failures are due to the network frequently dropping packets. You can permanently increase the library’s default number of attempts by changing DEFAULT_ATTEMPTS in and then re-building and re-installing the library. Beware, however, that if you ever try to write fancy fast light shows, things can get real ugly if you forget to set max_attempts=1 in any fire-and-forget calls. Not the best design, really…yet another refactor to add to the list :slight_smile:

Oh hey, I saw this in another post:

So yes, the official LIFX app does send frequent GetService messages while it’s open.

Interesting…I was thinking of hooking up a packet sniffer, and this helps.

In response to your and @jymbob’s comments: I was sending the power-on command as a possible way of waking up the bulbs, in case there were some kind of sleep-like mode they were in. I think that whole theory of mine is wrong and the bulbs don’t seem to have any kind of idle state.

Actually, last week I did try setting device.DEFAULT_TIMEOUT to 5 and device.DEFAULT_ATTEMPTS to 3. This felt uncomfortably similar to a monkey patch, so I’ll go ahead and use the lower-level method. It didn’t seem to help a lot, anyway.

I’m using Google WiFi, which is a mesh network that is very opaque in its behavior. Related to what you mentioned about WIFi, it’s entirely possible that they’re doing something after a period of time with no traffic to/from a specific client, or maybe the network is just a bit odd. I’m in a 45-unit apartment building, so it’s pretty noisy. Plus, some of the bulbs are in a 50’s-era metal light fixture, so there may be interference.

I think my next step is to try sending GetService, as you describe. I already have a thread that calls LifxLAN.get_lights() every 2 minutes, so I think I’ll try something there. I’ve seen power cycling the bulbs help as well, but the whole point of this thing is to not get out of the chair in the first place :grinning:.

Thanks to both @mclark and @jymbob for the comments; I’ll keep this topic updated with any progress I made. I’m a retired hobbyist, so things may move slowly.

Quick update: I had been pretty sloppy with exceptions, so my refresh thread was dying after a while. Now that I’ve tightened up the code, it keeps running, even after communication with a device fails. With that in place, it seems to work pretty well.

As mentioned earlier, I’ve got a Raspberry Pi Zero W sitting in the corner of a room, and will see what happens after it’s been running continuously for hours and (I hope) days