Skip to content

[kotlin2cpg] Add support for KtCallableReferenceExpression#5775

Open
SuperUserDone wants to merge 11 commits intomasterfrom
louis/kotlin2cpg-callablerefrences
Open

[kotlin2cpg] Add support for KtCallableReferenceExpression#5775
SuperUserDone wants to merge 11 commits intomasterfrom
louis/kotlin2cpg-callablerefrences

Conversation

@SuperUserDone
Copy link
Contributor

No description provided.

@SuperUserDone SuperUserDone requested a review from ml86 January 21, 2026 14:01
@SuperUserDone SuperUserDone force-pushed the louis/kotlin2cpg-callablerefrences branch from 8d8282f to 47108a6 Compare February 10, 2026 07:46
@SuperUserDone SuperUserDone force-pushed the louis/kotlin2cpg-callablerefrences branch from 47108a6 to ee3e39e Compare February 10, 2026 07:50
@SuperUserDone
Copy link
Contributor Author

The latest set of changes are major overhaul of the methodology of this PR. This now closely follows what the kotlin compiler does.

For unbound references to a global function ::globalFunction, it works similarly to the lambda method @johannescoetzee and I worked through for the javasrc2cpg case. It creates a MethodRef node, along with a dummy Type, TypeDecl and bindings for the method being overridden in the SAM Interface (Eg invoke for a function). If the SAM Interface is generic, an type-erased binding is also created.

For bound references to a companion object Type::function and to a regular method instance::method, a new Type and TypeDecl is created. The TypeDecl inherits from kotlin.jvm.internals.CallableReference. This class provides an this.receiver object. A constructor and implementation of the SAM Method is lowered into the CPG. The TypeDecl contains bindings for the method with concrete types, and if applicable a type-erased binding is also generated.

This does not work around kotlin compiler bug https://youtrack.jetbrains.com/issue/KT-54316, so the behavior is identical to the Kotlin compiler in the documented case. This only affects the companion case for assigned references.

This affects cases like this. This code won't compile because the Utils::validate ref has a broken signature of (Utils, Int) -> Boolean.

class Utils {
    companion object {
        fun validate(x: Int): Boolean = x > 0
    }
}

val ref: (Int) -> Boolean = Utils::validate

This case is fine tough and types are resolved correctly

class Utils {
    companion object {
        fun validate(x: Int): Boolean = x > 0
    }
}

fun bar(ref: (Int) -> Boolean) {}

bar(Utils::validate)

Copy link
Contributor

@johannescoetzee johannescoetzee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only checked the tests so far, but have left a few comments about the higher-level design and some nitpicks. I'll probably only get to reviewing the implementation tomorrow.

Comment on lines +50 to +52
val forwardedArgs = processCall.argument.isIdentifier.l
forwardedArgs.size shouldBe 2
forwardedArgs.map(_.typeFullName).toSet shouldBe Set("int", "java.lang.String")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the way this is written, it's possible to miss non-identifier arguments or to have the correct arguments in the wrong order without the test catching it. It's a nitpick, but in my opinion it is better to test explicitly, so something like

processCall.argument.map(_.typeFullName).toList shouldBe List(<something>, "int", "java.lang.String")

or

inside(processCall.argument.l) { case List(<something>, intArg: Identifier, stringArg: Identifier) =>
  intArg.typeFullName shouldBe "int"
  ...
}

It's more cumbersome to write and can be harder to read, but is also occasionally useful for catching weird bugs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that REF edges from the process arguments to the invoke parameters are being created correctly as well (in this method, but also wherever an Identifier node is added to the CPG). A lot of dataflow issues in the past were caused by missing REF edges (or REF edges referring to the wrong local/param) and testing it in the frontend AST tests is much faster than figuring it out through dataflow debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have adjusted some of the tests to use this pattern where I thought it made sense. In most of the tests also added checks for the ref edges to the identifier nodes. There was a bug that I caught through that and resolved.

}

"create a constructor call for the synthetic type with receiver as parameter" in {
val ctorCalls = cpg.call.nameExact("<init>").methodFullName(".*Function2Impl.*<init>.*").l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val ctorCalls = cpg.call.nameExact("<init>").methodFullName(".*Function2Impl.*<init>.*").l
val ctorCalls = cpg.call.methodFullName(".*Function2Impl.*<init>.*").l

should return the same, unless there's a type decl with <init> in the name. You could also use methodFullNameExact with the exact method full name to avoid any conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps a stylistic thing, but I also prefer to write these in a way that gives more information about where the call occurs, for example cpg.method("test").call.... How far to take this depends on the complexity of the code in question though and here just finding the <init> directly is probably fine

Copy link
Contributor

@ml86 ml86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I only reviewed the non test code. Once the comments are incorporated I will take a look at the tests.

None
}
}
.flatMap(paramDesc => Some(paramDesc.getType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra flatMap seems superfluous. Just do the .getType in the above flatMap.

protected def createUnboundSamTypeDecl(samInfo: UnboundSamInfo): Ast = {
val samTypeDecl = typeDeclNode(
samInfo.expr,
samInfo.samImplClass.split('.').last,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this quaranteed to always have a '.'?

Copy link
Contributor Author

@SuperUserDone SuperUserDone Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its not, but does String.split(..) not contain the original string as the only element of the array if so? Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. For whatever reason I was thinking that split returns an empty array if the character is not contained in the string.


val samMethodFullName = s"${samInfo.methodRefName}:${samInfo.signature}"

val methodRefBinding =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reused createBindingInfo

protected def createBoundSamTypeDecl(samInfo: BoundSamInfo): Ast = {
val samTypeDecl = typeDeclNode(
samInfo.expr,
samInfo.samImplClass.split('.').last,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Garanteed to have .?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Please ignore.

): Unit = {
val binding = bindingNode(methodName, signature, methodFullName)
bindingInfoQueue.prepend(
BindingInfo(binding, Seq((typeDecl, binding, EdgeTypes.BINDS), (binding, methodNode, EdgeTypes.REF)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The edge meta data for the REF edge is superfluous as long as it points to the same method as indicated by methodFullName. The backend will create the edge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Was one of the pain points as it was not possible to get a ref to the method node here, hence the duplication here: #5775 (comment)

val identNode =
identifierNode(samInfo.expr, paramName, paramName, registerType(paramType)).argumentIndex(idx + 1)
paramAst.root
.collect { case p: NewMethodParameterIn =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid single letter variable names.

val identNode =
identifierNode(samInfo.expr, paramName, paramName, registerType(paramType)).argumentIndex(idx + 1)
paramAst.root
.collect { case p: NewMethodParameterIn =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to have valueParameterAsts be not yet Ast wrapped NewMethodParameterIn nodes. That way you can avoid the collect.

samMethodName: String,
samMethodSig: String,
samGenericMethodSig: Option[String],
receiverParam: (Ast, String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second tuple value seems never to be used.

parameterInNode(samInfo.expr, "this", "this", 0, false, EvaluationStrategies.BY_SHARING, samInfo.samImplClass)
.dynamicTypeHintFullName(IndexedSeq(samInfo.samImplClass))
val receiverParamName =
samInfo.receiverParam._1.root.collect { case expr: ExpressionNew => expr.code }.getOrElse("receiver")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expr is e.g. a call expression this will result in strange looking parameter names. I suggest you just always name the parameter receiver.

}
}

private def extractReturnTypeFromSignature(signature: String): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature you pass into this function was at some point composed from a return type and the parameter types and now you parse it in order to extract the return type again. Please avoid that and propagated the not yet combined return and parameter types.

samInfo.expr,
receiverParamName,
receiverParamName,
"receiver",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use constant for "receiver" string.

.getContributedDescriptors(DescriptorKindFilter.FUNCTIONS, _ => true)
.asScala
.collectFirst {
case functionDescriptor: FunctionDescriptor if functionDescriptor.getModality.toString == "ABSTRACT" =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that we can avoid using .toString and instead check the modality directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. Will change that. I cannot remember why I wrote it like this.

Comment on lines +34 to +35
val syntheticTypeDecl = cpg.typeDecl.fullName(".*Function2Impl.*").head
val invokeMethod = syntheticTypeDecl.method.name("invoke").head
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code outside of an actual concrete test that references cpg must be evaluated lazily because it will otherwise trigger the CPG generation even if none of theses tests are selected. The cpg value itself is an exception and does not need to be lazily evaluated because the CPG is only created open first us of cpg. For details look into the KotlinCode2CpgFixture and its super classes.

Comment on lines +84 to +87
receiverAccess.ast.isIdentifier
.name("this")
.head
.typeFullName shouldBe "kotlin.jvm.internal.CallableReference"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static type of this should be the name of the generated wrapper class.

In order to test the field identifier and the this identifier better use receiverAccess.argument(2).asInstanceOf[FieldIdentifier]... and receiverAccess.argument(1).asInstanceOf[Identifier].... That way you encode the expected positions and check that these are really direct AST children of the call instead of being somewhere in the AST below it.

ctorCalls.size shouldBe 1

val ctorCall = ctorCalls.head
ctorCall.methodFullName shouldBe "com.test.Handler.process$kotlin.jvm.functions.Function2Impl.invoke:boolean(int,java.lang.String).<init>:void(com.test.Handler)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fullname seems longer than necessary. Seems like .invoke:boolean(int,java.langString) can be omitted without the danger of a naming collisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If com.test.Handler.process is overloaded, it might cause a collision, so at the very least the .invoke can be omitted, but I think the signature should stay.

"have constructor body that assigns receiver field" in {
val constructor = syntheticTypeDecl.method.name("<init>").head

constructor.ast.isCall.name("<init>").methodFullName(".*CallableReference.*").l shouldBe List.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this supposed to test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was a leftover from #5775 (comment). Will remove.

Comment on lines +187 to +188
val syntheticTypeDecl = cpg.typeDecl.fullName(".*Function1Impl.*").head
val invokeMethod = syntheticTypeDecl.method.name("invoke").head
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again make these lazily evaluated.

val syntheticTypeDecl = cpg.typeDecl.fullName(".*Function1Impl.*").head
val invokeMethod = syntheticTypeDecl.method.name("invoke").head

"use STATIC_DISPATCH without receiver access" in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation of static call target is missing. Check call.methodFullName.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to see validation that the companion object is given as argument 0 to the validate call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also explicitly set the argumentIndex in the implementation code to ensure this.

.l
.filter(_.bindingTypeDecl.fullName.contains("globalFunction"))
.sortBy(_.methodFullName)
bindings.size shouldBe 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to select the type decl you are interested in and than use .methodBinding.

class CallableReferenceTests extends KotlinCode2CpgFixture(withOssDataflow = false) {

"CPG for code with simple callback usage" should {
"SAM implementation for bound instance method reference" should {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A binding table test for the generated type decl seems to be missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only for the unbound case currently exist. Will add for the bound case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants