Thursday, November 19, 2020

Spring @Scheduled using DynamoDB AWS X-Ray throws SegmentNotFoundException: failed to begin subsegment

Introduction

AWS X-Ray is designed to automatically intercept incoming web requests, see at this introduction.  And also here

But when you start your own thread (either via a Runnable or plain new Thread() or @Scheduled), X-Ray cannot initialise itself: there are no web requests to intercept for it. Then it throws an exception like this: 

Suppressing AWS X-Ray context missing exception (SegmentNotFoundException): Failed to begin subsegment named 'AmazonDynamoDBv2': segment cannot be found.

In the above example the distributed DynamoDB Lock Client was used, which uses DDB in its implementation for acquiring and releasing a lock.

Regular web requests were not throwing this X-Ray exception.

Investigation

Amazon explains that crucial bit of knowledge that web requests are "automagically" setting up the X-Ray recorder a bit here

But that was not fully explaining it with an example. E.g only adding 

AWSXRay.beginSubsegment("AmazonDynamoDBv2") 

(and ending it) but that didn't fix it. That then gave this exception:

Suppressing AWS X-Ray context missing exception (SubsegmentNotFoundException): Failed to end subsegment: subsegment cannot be found.

My suspicion here is that the lock-client already closed the exactly same named subsegment "AmazonDynamoDBv2".
Also, I'm not creating any worker thread myself, Spring is doing it for me.

Note that you at least can avoid exceptions be thrown by setting environment variable AWS_XRAY_CONTEXT_MISSING   to LOG_ERROR. That will only log the above exception.

Solution

Creating a 'parent' segment and setting the trace entity and the subsegment did the job:

Entity parentSegment = AWSXRay.beginSegment("beginSegmentForSomeScheduledTask");
AWSXRay.getGlobalRecorder().setTraceEntity(parentSegment);
AWSXRay.beginSubsegment("AmazonDynamoDBv2");

I did test only creating the subsegment and only creating and setting the parentSegment. But those raised the exception again.
I did not further investigate whether the "AmazonDynamoDBv2" name of the subsegment is essential.
And of course the matching closeSegment() and closeSubsegment() calls of course; not the reverse order: close the last one begun as first.

This thread pointed me in the right direction. This was a next option I would have tried next: setting up my own filter to run it earlier in the (filter) chain; though the @Scheduled task of course does not have a filter. Another workaround would have been to put the X-Ray logging at a very high level:

logging.level.com.amazonaws.xray = SEVERE

Also helped in explaining is this X-Ray reported issue.

Update: additionally, the error was also due to DynamoDB Lock Client! I did not specify withCreateHeartbeatBackgroundThread() when creating the lock client, but the exception of X-Ray showed that it was trying to send a heartbeat. After explicitly setting withCreateHeartbeatBackgroundThread(false) the exception (and error) regarding segment cannot found was fully fixed.