Some weird observations when writing automated tests

I’m working on my LIFX Haskell library again, and currently I’m writing some automated tests for my library using HUnit (similar to junit, etc.). The approach I’m taking at the moment is to run the tests on my real bulbs, rather than trying to mock the bulbs.

I’m running into a few weird things:

  1. Occasional TLS errors. It doesn’t happen all that often, but once in a while I’m getting Exception: TlsExceptionHostPort (HandshakeFailed (Error_Packet_unexpected "Alert [(AlertLevel_Fatal,BadRecordMac)]" " expected: change cipher")) "api.lifx.com" 443.
    I’m not sure whether this is a bug on LIFX’s end, or a bug in Haskell’s TLS implementation.

  2. If I set the hue to something, and also set the saturation to 0 in the same command, and then read it back, the hue is still set to its previous value. Although this doesn’t make any difference in the actual color of the light (since hue is irrelevant when saturation is 0), it causes a problem for my tests, because I want to start each test by resetting the bulbs to a known state. (So that what happens in one test doesn’t affect the next one.) I was setting the HSBK to 0, 0, 0.5, 5000, but that was leaving the hue unchanged from the previous test. I worked around the problem by making my “known state” 0, 0.1, 0.5, 5000. This succeeds in setting the hue to 0, since the saturation is nonzero.

  3. In general, when I write values to the bulb and read them back, I’m getting a certain amount of error. Since the LAN protocol uses 16-bit integers, I would expect the floating point values in the cloud interface to get rounded to the nearest 1/65536. However, the amount of error I’m seeing is larger than that. I had to make my “acceptable error” 1/1000 before I started getting reliable results. Even weirder is that I’m seeing error in the kelvin, which is an integer; I might write 5000 and get back 4998. The weirdest thing is that the error seems dependent on the previous state. This makes it hard to write independent tests, since the state from one seems to “bleed over” into the next. (And I’m using a duration of 0, so I wouldn’t expect to see “in-between” values as the bulb changes from one state to another.)

I’m curious if anyone else has run into any of these problems, or if it’s just me. And for authors of LIFX libraries in other languages, I’d be curious to know what approach you’re taking to test your library.

  1. We terminate SSL in our own infrastructure using HAProxy and the latest OpenSSL. This is a fairly common setup and well tested. I can’t comment on the Haskell TLS implementation but I’ve had this working with PolarSSL (now mbed TLS), Go’s TLS implementation and of course languages using OpenSSL (Perl, Ruby, Python). So it is likely that there is a bug in the Haskell TLS implementation or something is attempting to modify or rewrite the TLS connection between you and our servers. I would love to hear how you go investigating this.

  2. I’m not sure why this would happen, but I’ve raised a bug internally to investigate it for you.

  3. This one worries me, I’ve raised a bug for this too.

From the response headers, it looks like the user-facing side of your API is terminated by CloudFlare. (Though I would expect that to be rock-solid, too, of course.)

If I continue to see this problem, I’ll try to file a bug against the Haskell TLS implementation. Unfortunately, it’s fairly rare, and I don’t know of any way to reproduce it on demand, so I doubt there’s much Vincent Hanquez (author of Haskell’s TLS library) can do with the information I have so far.

Thanks, Daniel! I appreciate that. Let me know if there’s any more information you need from me, since these two issues are easy for me to reproduce. This is with my Original 1000 bulbs, so I don’t know if the same thing happens with the newer bulbs.

I should also clarify that problems #2 and #3 happen with both the cloud API and the LAN protocol. So I would guess it’s something in the firmware itself, and not in the cloud.

Oh right of course! Yeah CloudFlare terminates the user SSL section and forwards it to us (although sometimes we disable CloudFlare and send requests directly to our servers). I would imagine their SSL stack is pretty solid too.

In that case I’ve filed the bugs wrong, I’ll make sure they are assigned to the firmware team.

I haven’t seen the TLS error lately, but I have seen 523 Origin Unreachable several times over the last 24 hours. The CloudFlare page says that 523 can be caused by Railgun, which I know you guys are using.

Interesting. I’ll raise this with CloudFlare. It’d be awesome if you see it again if you could gather the HTTP response headers so they can find them in their logs. Without this we don’t usually get much back from their support team. I haven’t seen this issue myself, and I haven’t seen any measurable drop in the requests to the API so I assume the impact is small for now.

Great minds think alike. :smile: I was just thinking I ought to post this:

[("Server","cloudflare-nginx"),("Date","Tue, 22 Mar 2016 06:09:17 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=ddab70856c1f84425b6dc5d610145c53c1458626953; expires=Wed, 22-Mar-17 06:09:13 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","28777938d1ed211a-LAX")]

Sorry about the formatting; that is how Haskell prints it out. But all the headers are there.

This evening, I’ve been seeing it approximately once every time I run my test suite. My test suite probably makes several hundred API requests, so that’s still just a fraction of one percent.

Here are six more of the 523 Origin Unreachable events (one per line), if that helps any:

[("Server","cloudflare-nginx"),("Date","Tue, 22 Mar 2016 06:25:56 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d15fd3c9791ae7c1d80b82b9042a4444b1458627951; expires=Wed, 22-Mar-17 06:25:51 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2877919bfb7d227c-LAX")]
[("Server","cloudflare-nginx"),("Date","Tue, 22 Mar 2016 06:50:43 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=dfa5d5ca56a9a567075a0cba45d9208331458629441; expires=Wed, 22-Mar-17 06:50:41 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2877b5f880a40364-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 23 Mar 2016 05:31:27 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d3fea39c94f4878913ef7d298b47c34b41458711082; expires=Thu, 23-Mar-17 05:31:22 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","287f7f2a568e22ac-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 23 Mar 2016 05:39:39 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d47ee696cbb42faffd164ece2dd1ef49d1458711575; expires=Thu, 23-Mar-17 05:39:35 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","287f8b33aabc211a-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 23 Mar 2016 05:42:11 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d3b01a3ab768904f14a0a55e40dca50f41458711728; expires=Thu, 23-Mar-17 05:42:08 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","287f8eec4ab007d9-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 23 Mar 2016 06:09:46 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d54232ef088abe07486774bfc6e94be221458713382; expires=Thu, 23-Mar-17 06:09:42 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","287fb74e9a7922ac-LAX")]

I’m still seeing 523 Origin Unreachable at about the same frequency.

[("Server","cloudflare-nginx"),("Date","Mon, 11 Apr 2016 06:29:49 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d7bd9dba8191201594d3caed586ad532c1460356188; expires=Tue, 11-Apr-17 06:29:48 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","291c62e2664d2222-LAX")]
[("Server","cloudflare-nginx"),("Date","Mon, 11 Apr 2016 16:19:42 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d8f3697047b01de638a972105aba6b6b71460391578; expires=Tue, 11-Apr-17 16:19:38 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","291fc2e45d472276-LAX")]

Any word from CloudFlare on this? And any update on issues #2 and #3 from the original post?

I’m still working with CloudFlare to debug our railgun setup, but it does only seem to be occurring for some people. Here in AU things seem to be working fine. I’ll follow them up with this new data and I’ll let you know if any further information might help.

With regards to #2 and #3, I’ve asked the firmware team and they have yet to get back to me. My guess is that #2 is an optimization that has something to do with fading between colors, and #3 could be due to the firmware actually using a lower resolution number somewhere deep in the firmware. Of course those are just my best guess. I’ll update here when the firmware team give me a response.

@ppelleti Any chance you could test now and see if you still see the railgun failures. CloudFlare have asked us to ratchet up the debugging on the railgun process, so we’ll get some super detailed logging.

Sure, I was able to get four failures this evening:

[("Server","cloudflare-nginx"),("Date","Wed, 13 Apr 2016 05:10:52 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d9938a218d904582e0588dfb039e808d61460524248; expires=Thu, 13-Apr-17 05:10:48 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","292c69e6091322ac-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 13 Apr 2016 05:14:19 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d9951f2a4d7e3b9514c712d2d0973c1071460524454; expires=Thu, 13-Apr-17 05:14:14 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","292c6ef2714720f6-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 13 Apr 2016 05:29:42 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d8f036a9027601080e0542c286bc071361460525377; expires=Thu, 13-Apr-17 05:29:37 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","292c857a7c8f2222-LAX")]
[("Server","cloudflare-nginx"),("Date","Wed, 13 Apr 2016 05:38:51 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=db46f03ea92ab3e2c79b23fd9e6f158051460525926; expires=Thu, 13-Apr-17 05:38:46 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","292c92df6d8c036e-LAX")]

Thanks!

It turns out it was a mixture of issues inside our infrastructure. We’ve now corrected this and things should be performing normally. Could you confirm that the issue is resolved?

I haven’t seen any 523 Origin Unreachable, but I’m seeing some new errors I haven’t seen before.

So, far, I’ve seen two instances of 520 Origin Error:

[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 00:42:25 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d6f7f5111f222827fd8e287eabb17218c1460594535; expires=Fri, 14-Apr-17 00:42:15 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","29331de570d822fa-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 00:45:33 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d8d7ce44b602e20d3a26d85720006eaab1460594723; expires=Fri, 14-Apr-17 00:45:23 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2933227f1cf413e3-LAX")]

And I’ve seen one instance of 503 Service Unavailable:

[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 01:00:46 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d324c44a43a3141b444a4d068496771d51460595637; expires=Fri, 14-Apr-17 01:00:37 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","6d5fffe463 99.99 8.642268 0030 fe49"),("CF-RAY","293338cfbd260373-LAX")]

Here are some more errors.

520 Origin Error:

[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 01:54:48 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d3ea8f586fd271a28af4b1ac77c791eb21460598880; expires=Fri, 14-Apr-17 01:54:40 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","293387fd0dcb20ae-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:00:54 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d0d8d24dcae67e981eb67a4c4ba33f3e31460599240; expires=Fri, 14-Apr-17 02:00:40 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","293390c7aa89068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:01:28 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d9b3f3fc54d8b357fb28d11751c02fcb81460599283; expires=Fri, 14-Apr-17 02:01:23 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","293391d36a2b068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:26:06 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d510aba378e3ee3ca7794878d8c06297d1460600761; expires=Fri, 14-Apr-17 02:26:01 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2933b5ea1ec722d0-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:31:58 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d34058850c00645524d734177ff5dafd71460601111; expires=Fri, 14-Apr-17 02:31:51 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2933be745f9e2270-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:32:27 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d72a14571868fd7c2e55d2b630439e93b1460601142; expires=Fri, 14-Apr-17 02:32:22 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2933bf32559f2270-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:35:36 GMT"),("Content-Type","text/html; charset=UTF-8"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d18c3a02fafbfc4bf2f66bd45898aed3f1460601324; expires=Fri, 14-Apr-17 02:35:24 GMT; path=/; domain=.lifx.com; HttpOnly"),("Pragma","no-cache"),("X-Frame-Options","SAMEORIGIN"),("CF-RAY","2933c3a8e3992270-LAX")]

503 Service Unavailable:

[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:01:14 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d62156971e6105c6f27eb55f55a0aaa391460599268; expires=Fri, 14-Apr-17 02:01:08 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","5efc56327c 99.99 5.583939 0030 fe49"),("CF-RAY","29339174f4a0068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:01:43 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d1ee90ff1c7f6bb998d25d606bb91e1721460599303; expires=Fri, 14-Apr-17 02:01:43 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","89d42c4c8d 0.09 0.003540 0030 fe49"),("CF-RAY","2933924dbfa9068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:02:05 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d523bdb4ef443c5d882523531747481351460599325; expires=Fri, 14-Apr-17 02:02:05 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","9fec237073 99.99 0.005725 0030 fe49"),("CF-RAY","293392d9773d068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:02:07 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d8130457ec25dbc34b6bd533345651d101460599327; expires=Fri, 14-Apr-17 02:02:07 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","6c8cefd135 0.09 0.001749 0030 fe49"),("CF-RAY","293392e487c2068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:02:09 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d2ef1e55334fa7a6a92e8943b0f1c34371460599329; expires=Fri, 14-Apr-17 02:02:09 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","1c998af33c 0.09 0.002107 0030 fe49"),("CF-RAY","293392ee4831068b-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:15:50 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d5243ed0a2f3b9bbe886d766fa052c56c1460600144; expires=Fri, 14-Apr-17 02:15:44 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","c8222877f5 99.99 5.295891 0030 fe49"),("CF-RAY","2933a6d91fbf2270-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:17:11 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=d14ee90ea755438328bc120537dc342411460600221; expires=Fri, 14-Apr-17 02:17:01 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","93e9835b79 99.99 10.053552 0030 fe49"),("CF-RAY","2933a8b6d8a92270-LAX")]
[("Server","cloudflare-nginx"),("Date","Thu, 14 Apr 2016 02:22:33 GMT"),("Content-Type","text/html"),("Transfer-Encoding","chunked"),("Connection","keep-alive"),("Set-Cookie","__cfduid=dcd4f75532fc9626fac5397e9aaacc1b21460600542; expires=Fri, 14-Apr-17 02:22:22 GMT; path=/; domain=.lifx.com; HttpOnly"),("Cache-Control","private,max-age=5"),("Cf-Railgun","36beb92154 99.99 11.130691 0030 fe49"),("CF-RAY","2933b08c9a5e0d49-LAX")]