Skip to content

Embed Scala source code parameters into .class files meta-data (for better handling of Scala libraries in IDEs) #21894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
unkarjedy opened this issue Nov 6, 2024 · 16 comments
Labels
area:settings Issues tied to command line options & settings. area:tooling itype:enhancement

Comments

@unkarjedy
Copy link
Contributor

Original issue description in IntelliJ Scala Plugin YouTrack:
https://youtrack.jetbrains.com/issue/SCL-20896/Source-code-parameters-for-Scala-libraries


Short summary:
The issue: IDEs can’t reliably interpret Scala source files in libraries due to missing project-specific compiler configurations (like compiler version, compiler options, compiler plugins used during the compilation)
The proposal: Embed essential compiler configurations in metadata (e.g., via a @CompilerOptions annotation) to maintain accurate source interpretation without relying on build tool modifications.


Below is the copy of the original YouTrack ticket description:

Normally, Scala source code is interpreted in the context of a "project", so that each source file has an associated compiler version and compiler options, which might include -Xsource and -Xplugin.

This is true for both the Scala compiler and an IDE. However, things are different when it comes to external libraries - the compiler reads the bytecode, which is inherently unambiguous, whereas an IDE might also show the source code. In such a case, there's no way to determine the original project configuration and thus to unambiguously interpret the code.

It isn't possible to assume that all source code is Scala 3, because Scala 3 and Scala 2 must be interpreted differently. What's more, even Scala 2 now has the -Xsource:3 option, so there's no single "Scala 2" code. While it's possible to detect macro annotations in source code, compiler plugins are transparent, e.g. the kind-projector. Interestingly, the same is also true for distinguishing between different Scala 2.x versions, but this is less an issue because 2.x is compatible with 2.x+1, and the version of libraries must match the version of a project module - Scala 2.x versions are binary incompatible but source compatible, whereas Scala 2 & 3 / -Xsource:3 / compiler plugins are binary compatible but source incompatible.

A compiler doesn't need to compile the already compiled libraries again, so source JARs are not even downloaded. However, while an IDE can also read APIs from the bytecode, if a user navigates to a class or method, the IDE should be able to show the corresponding source code, which is thus downloaded. Although the IDE doesn't "compile" the source code in the usual sense, syntax highlighting requires lexical analysis, code folding / indent guides require parsing, navigation requires type inference, all of which depend on the "compilation" parameters.

Currently, we rely on a JAR name to determine whether it contains Scala 2 or Scala 3 sources. However, the heuristics is unreliable, and is not applicable to non-Courser/Maven JARs, or sources in a directory.

While we might detect the presence of .tasty files in a JAR, in principle, a single JAR might contain bytecode from different Scala versions (or modules compiled with different parameters). We might index the .class / .scala correspondence, but it's more complicated and is not applicable to -Xsource:3 or compiler plugins.

We may benefit from a more reliable solution.
For example, each JAR might include used compilation parameters in META-INF, either for the whole JAR, or for the particular source files, etc.
However, in this case the responsibility for bundling of the options would be transferred to the build tools.

Another option is to store (selected non-default) compiler options in a @CompilerOptions annotation, much like the existing @SourceFile annotation. The advantage is that this doesn't require modifying build tools, and the metadata is per-file.

The simplest solution would store the options just as strings.
There is also a potential room for discussion of minimizing the amount of data stored in the annotation (if it's considered an issue).
E.g. it could store only a subset of "essential" compiler options (then another task would be to identify what are the "essential") OR the compiler options could be encoded somehow.

@unkarjedy
Copy link
Contributor Author

The same is actual for Scala 2, so I invite to @som-snytt, @lryt, @SethTisue to this discussion

@sjrd
Copy link
Member

sjrd commented Nov 6, 2024

Also ping @adpi2, who I believe did something in this space for Metals.

@Gedochao
Copy link
Contributor

cc @tgodzik @rochala

@Gedochao Gedochao added itype:enhancement area:tooling area:settings Issues tied to command line options & settings. labels Nov 12, 2024
@adpi2
Copy link
Member

adpi2 commented Nov 12, 2024

The issue: IDEs can’t reliably interpret Scala source files in libraries due to missing project-specific compiler configurations (like compiler version, compiler options, compiler plugins used during the compilation)

Thanks @unkarjedy for starting this discussion. This issue also affects the debugger's expression evaluator (and potentially other parts of Metals, though I’m not certain).

However, I don't think that storing these compiler inputs in the class file or TASTy file is ideal, because:

  • Duplication: These values would be identical across all classes in a JAR, (except in cases of fat or shaded JARs).
  • Missing information: The compiler only knows the path of the plugin JARs, but not the Maven coordinates which are more useful and trustworthy.
  • This metadata should reside alongside the classpath, which is also an input for the compiler.

Instead, I believe this metadata should be included in the POM file by the build tool that generates it, along with the dependencies.

@SethTisue
Copy link
Member

I believe this metadata should be included in the POM file by the build tool that generates it

Has there been any prior discussion along these lines...?

@adpi2
Copy link
Member

adpi2 commented Nov 12, 2024

Has there been any prior discussion along these lines...?

I mentioned that idea during the tooling summit in Madrid, but I don't remember if there is an online discussion about it.

@SethTisue
Copy link
Member

Ah, I think that's what I'm remembering, yeah.

@SethTisue
Copy link
Member

@unkarjedy What are examples of the worst thing or things that currently happen to users, for the lack of this?

I can see that in theory:

there's no single "Scala 2" code

could be a source of issues, but I'm curious what the actual user pain currently looks like

@unkarjedy
Copy link
Contributor Author

Hello. I have several points to comment on, but unfortunately, I am quite busy these weeks.
I will respond some time later, probably next week.

@tgodzik
Copy link
Contributor

tgodzik commented Nov 20, 2024

We discussed it today and decided that it would be best to put it into pom.xml and that should be the responsibility of a build tool. We need to check how hard would it be to add.

@adpi2
Copy link
Member

adpi2 commented Nov 21, 2024

We discussed it today and decided that it would be best to put it into pom.xml and that should be the responsibility of a build tool. We need to check how hard would it be to add.

I think that Scala CLI uses coursier/publish to create POM files and to publish them. In sbt we still use Ivy (a fork of it) but we would like to move away from it, and we are considering coursier/publish instead (see discussion). So if it is implemented there, it would be a good incentive to make the move. Maybe Mill could use coursier/publish as well, if it is not the case already.

@unkarjedy
Copy link
Contributor Author

Sorry, I completely lost sight of this ticket.


I still believe that the compiler should store some limited amount of information directly in the Class / Tasty files.
At least those bits of official Scala that affects the language (standard compiler options).
Moving the responsibility to build tool maintainers will only complicate everything.


Duplication: These values would be identical across all classes in a JAR, (except in cases of fat or shaded JARs)



AFAIU there is plenty of existing duplication: versions, imports (!), source file, source file paths, etc… (I haven’t rechecked everything).

Added information shouldn’t take much extra space.

If the project is compiled as a plain Scala without any syntax modifiers, then there will be practically no overhead.

If we encode only the essential compiler options that change the language syntax/semantics/produced bytecode/tasty, we might need just 1 byte per file (in practice even 3-4 bit could be enough).

If we encode the plugins, then yes, there will be more info to be kept.

But it’s not the the default use case.
If a project used some plugins, then the generated binary files “worth to know about that”.

Using a compiler plugin is a big thing worth mentioning.


Missing information: The compiler only knows the path of the plugin JARs, but not the Maven coordinates which are more useful and trustworthy.


AFAIU The information about the plugin can be extracted from the plugin jar META-INF.
For example “Implementation-Title” + “Implementation-Version” - I’ve checked with 3 popular plugins and it’s present there.
("kind-projector”, “better-monadic-for”, “wartremover")

This metadata should reside alongside the classpath, which is also an input for the compiler.

I don’t think I fully got it.

As I understand, the Scala version itself resides in each binary file, right?

Same arguments should be applied to some compiler options.

“Scala version X” alone doesn’t describe which exact Scala (our of infinity of combinations) was used.

@unkarjedy
Copy link
Contributor Author

unkarjedy commented Jun 7, 2025

@SethTisue

What are examples of the worst thing or things that currently happen to users, for the lack of this?
I can see that in theory:

there's no single "Scala 2" code

could be a source of issues, but I'm curious what the actual user pain currently looks like

Right now I am not aware of external users issues caused by this.
But, it can due to multiple things:

  • Some bug reports could be simply not processed yet by us to understand that his was the root cause.
    If we get some bug report in the debugger we might need to dig deep to infer what is the root cause.
  • Yet low adaption of Scala 3 and usage of libraries compiled with -Xsource:3.
  • Also, most of the users remain silent even when bugs happen. They just got used to things not working.

But I suffer form this issue myself and that's how I remembered about this now.
We have this sbt plugin: https://github.com/JetBrains/sbt-idea-plugin.
The latest sbt uses -Xsource:3 and the plugin source code also uses it.
I noticed the issue when debugging the plugin it in the Scala Plugin codebase.
I guess that at the issue is at least actual when users try to debug any sbt plugins that are actively maintained.

Basic things might just not work, and without digging deep one wouldn't understand that it's caused by this.
Stop at breakpoint will mostly work, but if you try to evaluate code or watchers it might not work.
It depends on the code.
For example if inside your file you use * in imports or as instead of =>, the code that relies on that won't work

Image Image Image

UPD
"Go to" from library sources doesn't work as well if it involves any dependencies from the imports in -Xsource:3 style

Image Image Image

@tgodzik
Copy link
Contributor

tgodzik commented Jun 10, 2025

I understand the issue and I think we do need to fix it, but adding things to pom.xml or to manifest would really be the way to go.

AFAIU there is plenty of existing duplication: versions, imports (!), source file, source file paths, etc… (I haven’t rechecked everything).

Added information shouldn’t take much extra space.

Sure, but for a really large codebase that becomes to actually be a problem. And if we really don't have to add the info to each classfile we should not do that.

AFAIU The information about the plugin can be extracted from the plugin jar META-INF.
For example “Implementation-Title” + “Implementation-Version” - I’ve checked with 3 popular plugins and it’s present there.
("kind-projector”, “better-monadic-for”, “wartremover")

And if this information is not present? I don't think we can also extract the repository this is present in without the build tool. So this would be really limited mechanism.

Moving the responsibility to build tool maintainers will only complicate everything.


It's actually less complicated since they already create those pom.xml and manifests. This logic is already there, we just need a bit more info.

But still I think the biggest issue and it will likely be vetoed is to make every classfile larger without clear benefits to do this over using build tool mechanisms. And as you said it yourself, this hasn't even been reported by any user. For you case the solution would be to always add -Xsource:3 when analyzing dependency sources.

@unkarjedy
Copy link
Contributor Author

unkarjedy commented Jun 12, 2025

And if this information is not present?

AFAIU there are some contracts saying that it should be there and build tools already add it there.
I didn't see plugins without this info.

I don't think we can also extract the repository this is present in without the build tool. So this would be really limited mechanism.

Why is the repository needed at all?

But still I think the biggest issue and it will likely be vetoed is to make every classfile larger without clear benefits to do this over using build tool mechanisms. And as you said it yourself, this hasn't even been reported by any user.

JFTR
I made script to quickly check how much the size of jar files would increase for the 10 popular scala libraries.
If we add an annotation only with the encoded compiler option as short, the jar file size increase is about 1.4 - 3%
If we add an annotation with the compiler option and 3 plugins then its' ~4-6% increase

Given the use case it might be indeed a bit much.

For you case the solution would be to always add -Xsource:3 when analyzing dependency sources

This hack can shoot back as in Scala 2 import p.* is a valid import without -Xsource:3 and it has a different semantics.
And & is a valid type in Scala 2

@unkarjedy
Copy link
Contributor Author

Yeah, sleeping over it, I think that it would make sense to add it to the build tool.
I would vote for adding this info into META-INF instead of pom.xml to only rely on the jar file contents, not on the repo internal machinery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:settings Issues tied to command line options & settings. area:tooling itype:enhancement
Projects
None yet
Development

No branches or pull requests

6 participants