Skip to content

[kotlin2cpg] Lambda capture fix (#8544)#5739

Open
SuperUserDone wants to merge 3 commits intomasterfrom
louis/kotlin-lambda-captures
Open

[kotlin2cpg] Lambda capture fix (#8544)#5739
SuperUserDone wants to merge 3 commits intomasterfrom
louis/kotlin-lambda-captures

Conversation

@SuperUserDone
Copy link
Contributor

@SuperUserDone SuperUserDone commented Jan 13, 2026

Resolves: https://github.com/ShiftLeftSecurity/codescience/issues/8544

This commit solves an issue where lambdas incorrectly capture variables that aren't used within the lambda. As an example, for the given code, the method parameter x and Bar.this are captured into the lambda. Only baz should be captured into the lambda.

package simple.pkg

class Bar {
    fun foo(x: String) {
        val baz: String = "BAZ"
        1.let { println(baz) }
    }
}

@SuperUserDone SuperUserDone requested a review from ml86 January 13, 2026 14:29
@SuperUserDone SuperUserDone marked this pull request as ready for review January 13, 2026 14:29
@SuperUserDone
Copy link
Contributor Author

@SuperUserDone
Copy link
Contributor Author

SuperUserDone commented Feb 24, 2026

I have changed the logic, as the previous approach did not properly handle nested lambdas. Now the logic traverses the CompilerAPI AST directly to find captures instead of trying to first create the CPG then trying to deduce closure bindings from that.

@SuperUserDone SuperUserDone force-pushed the louis/kotlin-lambda-captures branch from 28fe7b9 to baa0e6a Compare February 26, 2026 11:17
@johannescoetzee
Copy link
Contributor

Could you please add a test for nested lambdas confirming that capture locals and the corresponding closure bindings are created for each intermediate step (if this does not already exist outside this PR)?

For example in java

import java.util.function.Supplier;

class Test {
  Supplier<Integer> foo(int capturedParam) {
    return () -> {
      Supplier<Integer> sp = () -> capturedParam;
      return sp.get();
    };
  }
}

the outer lambda would need a local and closure binding for the capturedParam capture and the inner lambda should have a local and closure binding for capturing the capturedParam local in the outer lambda (so the overall capture happens in 2 steps).

I recently had to fix this for javasrc2cpg: #5795

@SuperUserDone
Copy link
Contributor Author

I have added assertions to an existing test. Let me know if you think that would be sufficient?
https://github.com/joernio/joern/pull/5739/changes#diff-f49ca26417025a34de759762bf498683a5d2a6aaa1a4463d46e4d6328ccda0d5R591-R626

@SuperUserDone SuperUserDone force-pushed the louis/kotlin-lambda-captures branch from 908ce17 to 839bcc1 Compare March 3, 2026 06:45

cb._refOut.size shouldBe 1
"should only capture the used local" in {
cpg.methodRef.outE.collectAll[Capture].size shouldBe 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wherever possible avoid going to the edge level in tests and use steps instead. Here _closureBindingViaCaptureOut.

Comment on lines +117 to +118
cpg.closureBinding._localViaRefOut.name.toSet shouldBe Set("baz")
cpg.closureBinding._methodParameterInViaRefOut.name.toSet shouldBe empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of starting with new cpg.... queries it would be better to make a variable for the method reference and then expand from there. Like it is now you do not test that the closure bindings are associated with the right methodRef.

case node: NewMethodParameterIn => NodeContext(node, node.name, node.typeFullName)
case node: NewLocal => NodeContext(node, node.name, node.typeFullName)
}
.filter(capturedNodeContext => hasDestructuringParam || referencedNames.contains(capturedNodeContext.name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all capturedNodeContext taken if there is even a single destructuring parameter?
Are there tests for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about it, this is a bit of a hack. The idea is that a lambda with at least one destructuring parameter bypasses usage filtering and captures all visible outer locals/params as a compatibility workaround for destructuring. The OSS dataflow engine seems very particular about these sorts of things. I'll try to think of a better method to solve this issue.

Ideally true usage-based capturing would be possible, but unfortunately this heuristic based method is the best that I found possible with my current understanding of the kotlin compiler API and internals. I'll do some more reading into the internals and reading more into how other frontends handle this for a potential cleaner fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +541 to +545
val generatedName = s"${Constants.DestructedParamNamePrefix}${destructedParamKeyPool.next}"
paramAsts.append(astForParameter(param, valueParamStartIndex + idx, Some(generatedName)))
destructedParamAsts.appendAll(astsForDestructuring(param, Some(generatedName)))
} else {
paramAsts.append(astForParameter(param, valueParamStartIndex + idx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be an unrelated fix for the naming for destructuring parameters. Cant find a corresponding change in the tests. What am I missing?

Copy link
Contributor Author

@SuperUserDone SuperUserDone Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated destructuring parameter name is passed into astsForDestructuring(...) so the synthetic it initializer (tmp = it) can be wired to the real lambda parameter node instead of a standalone local, which restores correct closure/dataflow edges (at least for the OSS dataflow engine).
That was needed because, after tightening capture to used variables, the old destructuring path lost the parameter-to-destructured-value link and caused the lambda dataflow test to go empty, so this change preserves the expected flow.

I'll add a test to test the parameter name generation. The Kotlin2cpg frontend does have a lot of missing tests for edge cases like this. I'll make a note for future work to also check that tests are sufficient to test edge cases like this.

The test that caught this issue was this:

https://github.com/joernio/joern/blob/bac40363833e6ec71801e2d37f1e326f51d7f71a/joern-cli/frontends/kotlin2cpg/src/test/scala/io/joern/kotlin2cpg/dataflow/LambdaTests.scala

@SuperUserDone
Copy link
Contributor Author

SuperUserDone commented Mar 5, 2026

I have pushed another commit with some small changes, but still need to think through #5739 (comment) and #5739 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants