Skip to content
Closed
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
8b3be64
Added some links and implementation details
Dec 7, 2018
f9ba6f3
create SI draft
Dec 7, 2018
066b9c9
update
Dec 7, 2018
2e727e1
Merge pull request #1 from jash11/details
Dec 7, 2018
ad74cdd
Update 1NNN-JSH.md
Dec 7, 2018
6c28dba
Rephrase "string interpolation" to "string sequence literal"
Dec 7, 2018
5ce9b88
Expand abstract.
Dec 8, 2018
261ef14
add pros/cons of library
Dec 8, 2018
dd797a2
add library cons
Dec 8, 2018
97c4f0a
Merge branch 'master' into abstract-wording
Dec 8, 2018
4c929d6
Merge pull request #2 from quickfur/abstract-wording
Dec 8, 2018
f9502f4
link to rationale
Dec 8, 2018
800bf97
move lib vs lang after description
Dec 8, 2018
0d14031
wording
Dec 8, 2018
ce8aa9f
Need to support arbitrary expressions
schveiguy Dec 8, 2018
7d363af
Move new sections to Description
schveiguy Dec 8, 2018
3569537
Change title
Dec 8, 2018
1a10e99
Merge pull request #3 from jash11/schveiguy-patch-1
Dec 8, 2018
35fb28e
Merge pull request #4 from jash11/change-title
Dec 8, 2018
88b141c
move a paragraph to optional improvements
Dec 8, 2018
83f98cf
Optional improvements -> Possible improvements
Dec 8, 2018
cfcc96d
Remove changes to alias
pbackus Dec 8, 2018
e8b8c58
Merge pull request #5 from jash11/no-alias-changes
pbackus Dec 8, 2018
add9e91
add database example (still needs work)
Dec 8, 2018
4e42bfe
keeps terminology consistant
Dec 8, 2018
553076d
fix example
Dec 8, 2018
ad4ebbf
spelling
Dec 8, 2018
f3ae2a0
Remove "expressions" section
Dec 8, 2018
cbd5122
Merge pull request #6 from jash11/jash11-patch-1
Dec 10, 2018
ebfc529
Update 1NNN-JSH.md
Jan 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions Drafts/1NNN-JSH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# String Syntax for Compile-Time Sequences

| Field | Value |
|-----------------|-----------------------------------------------------------------|
| DIP: | (number/id -- assigned by DIP Manager) |
| Review Count: | 0 (edited by DIP Manager) |
| Author: | Jason Helson |
| Implementation: | https://git.io/fpSUA |
| Status: | Will be set by the DIP manager (e.g. "Approved" or "Rejected") |

## Abstract

This DIP proposes adding a "string sequence literal" to D, primarily inspired
by string interpolation, but also applicable to a wide variety of use cases.

In a nutshell, this literal:

````
i"Hello, ${name}! You have logged on ${count} times."
````

is translated to the [compile-time sequence](https://dlang.org/articles/ctarguments.html):

````D
"Hello, ", name, "! You have logged on ", count, " times."
````
Note that the compiler does not perform any string interpolation; it merely
segments the literal into a sequence of strings and expressions. The intent is for
further processing to be done in library code (see [rationale](#rationale) for a more detailed
description of possible applications).

### Reference

- Exploration: https://github.com/marler8997/interpolated_strings
- Example Library Solution: https://github.com/dlang/phobos/pull/6339/files
- Implementation: https://github.com/dlang/dmd/pull/7988
- https://forum.dlang.org/thread/khcmbtzhoouszkheqaob@forum.dlang.org
- https://forum.dlang.org/thread/c2q7dt$67t$1@digitaldaemon.com
- https://forum.dlang.org/thread/qpuxtedsiowayrhgyell@forum.dlang.org
- https://forum.dlang.org/thread/ncwpezwlgeajdrigegee@forum.dlang.org
- https://dlang.typeform.com/report/H1GTak/PY9NhHkcBFG0t6ig (#3 in "What language features do you miss?")

## Contents
* [Rationale](#rationale)
* [Description](#description)
* [Language vs Library Feature](#language-feature-vs-library-feature)
* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations)
* [Copyright & License](#copyright--license)
* [Reviews](#reviews)

## Rationale

Sequence literals apply to a wide range of use cases. A few of these use cases are outlined below.

#### String Interpolation
One notable use for sequence literals is in string interpolation, which allows for more concise, readable, and maintainable code. For example:


src/build.d:556:<br>
`auto hostDMDURL = "http://downloads.dlang.org/releases/2.x/"~hostDMDVer~"/dmd."~hostDMDBase;`<br>
Becomes:<br>
`auto hostDMDURL = i"http://downloads.dlang.org/releases/2.x/$hostDMDVer/dmd.$hostDMDBase".text;`<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better example would be to use a non-string variable:

import std.conv : to;
string folder = "releases/" ~ dmd_major_version.to!string ~ ".x";
// no `to` import needed (but `text` might be)
string folder = "releases/$dmd_major_version.x".text;

And, with syntax highlighing:<br>
![https://i.imgur.com/tXm6rBU.png](https://i.imgur.com/tXm6rBU.png)


src/dmd/json.d:1058:<br>
``s ~= prefix ~ "`" ~ enumName ~ "`";``<br>
Becomes:<br>
``s ~= i"prefix`$enumName`".text;``<br>
With syntax highlighting:<br>
![https://i.imgur.com/KTcOS0F.png](https://i.imgur.com/KTcOS0F.png)



#### Database Queries

`db.exec("UPDATE Foo SET a = ?, b = ?, c = ?, d = ? WHERE id = ?", aval, bval, cval, dval, id);`<br>
Becomes:<br>
`db.exec(i"UPDATE Foo SET a = $(aval), b = $(bval), c = $(cval), d = $(dval) WHERE id = $(id)");`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent, the implementation of exec has to change to receive interleaved strings and variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and actually, I would never, as a library author, write a library function that works with i"..." as it is defined right now because it would be so easy to do wrong.

What would stop you from writing db.exec("UPDATE foo SET a = ", a)? Well, nothing, and it would work fine, but just kinda coincidentally.

But what about db.exec(i"UPDATE $(tablePrefix)table SET a = ?", 10). OK, it is fair to say "don't do that", but there's no way to tell, in the library, that they did!

It translates to db.exec("UPDATE ",tablePrefix,"table SET a = ?", 10). At run time, you can issue a sql syntax error, but it is impossible to tell the difference at compile time.

I hate to be the downer, cuz I basically like this proposal, but I still think we should be making this thing give a new type. Instead of a naked tuple, do an anonymous struct. Change text and writeln etc. to recognize this new kind of struct, or at the usage site, use .tupleof.

`i"foo $(a)".tupleof.whatever is still an option... but then library authors who actually want to work with the details have them available. (and we should probably change text and writeln so those just work first anyway.)

Copy link
Contributor

@ntrel ntrel May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should be making this thing give a new type. Instead of a naked tuple, do an anonymous struct

Amen. We need to be able to tell on the callee side if an interpolated sequence was passed vs just variadic args. And if it's a struct we can also have a text method wrapping std.conv.text, so we can actually get a string easily without having to keep adding a local import every time we use interpolated strings. Getting a string is a major use case, however nice parsing a sequence of elements is for efficiency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can live with the convenience .text method. (i almost want to call it .toString to leverage existing things that call that name, but whatever, i don't care that much about the name).

I am slightly disgusted at having a magic struct created by the compiler include such a reference, but I think it works and the compiler already magically references std.math in syntax, so I can live with it. I'd just make its magic .text thing be a zero-arg template, so you don't pay for it if it isn't actually called.

I think I will write more about this in today's "this week in D"; I need to write that in a couple hours anyway and that gives me some content to dual-purpose since i missed most of dconf lol

I'll link it back here in a while. It might be a fairly different implementation though, since a magic struct cannot - I think anyway - be done with just a lexer hack. Even if you wrapped it in a call syntax to punt the details to the runtime library, you'd actually lose some functionality. (consider: foo!i"$(some_alias)". If that is passed to __d_is(some_alias), the aliasness is gone. On the other hand, foo(i"$(a + 4)") being passed to __d_is!(a+4) is liable to fail with "variable a is not accessible at compile time". So I don't think they both can work with a library function injected in the middle. Now, if I had to choose, I think the latter case is more important and I'd sacrifice the former case, but with the current PR, both work, and I think that is actually kinda cool. So I wanna preserve that. And a compiler-generated magic struct can do it.)

Anyway, I'll write more in a few hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since we have alias this, we can default to expanding the interpolated struct into a tuple while still giving access to methods to work with the interpolated data structure.

writeln(i"1 + 2 is $(1+2)");

// don't need to import std.conv.text because .text is just a member of the
// interpolated string type
audo s = "1 + 2 is $(1 + 2)".text;

I'd like to know where @WalterBright and @andralex stand on this. Since they are the ones to decide what design will be accepted, I'd like to get their input on this particular aspect. But I think I agree with you on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I forgot that Tuple doesn't automatically expand to a "tuple", that's kind of the whole reason why Tuple exists :)

In any case I do see alot of versatility in using the structure. It's a very good idea and I'm on board with it. It does make a few use cases harder by sometimes requiring the user to use tupleof and/or requiring a wrapper template, but a small cost compared to the benefits. There will likely only be a handful of functions that need to accept interpolated strings. Though, I would like to be able to do things like:

// contrived example, you wouldn't actually do it this way
writeln(i"foo is $(foo) and bar is ", bar, i" and baz is $(baz)");

This would mean I don't think we could just use a wrapper, but would need to add a static if inside writeln to handle interpolated strings to automatically expand them.

And yes, having a toString(sink) would also be good.

Thanks for taking the time to write this up, I like it alot. I'll think about it and may update my implementation to it.

Copy link
Contributor

@marler8997 marler8997 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment on your interpolation proposal. I think a big use case for interpolated strings is to make mixins much easier to write/maintain. Using q{} string literals along with interpolated strings is a big win. You mentioned in your article that you don't think they should be supported, and provided an example that you thought was ambiguous. However, it's currently well-defined in my implementation.

There's 2 ways you could go about string interpolation. You could either parse the interpolated string at the same time you are you are parsing the string literal, or you could first parse the string literal and then interpolate it afterwards. I opted for the second design. With this model, it's easier to determine what will happen in cases like yours because the string is processed by each part one at a time. So in your example:

mixin(iq{
   string s = i"foo $(bar)";
});
  1. The token string is parsed. Token strings don't have escapes so it stays the same.
  2. Then the interpolater runs. It ignores everything except when it sees a '$' character. The rule is made intentionally simple so a developer will be able to determine what will happen. The the string above will become:
mixin(__d_interpolatedString(`
   string s = i"foo `, bar, `";
`));
  1. Now mixin gets involved, of which interpolatedString support should be added, and it will convert bar to a string. Let's assume its string representation is "barvalue".
mixin(`
   string s = i"foo barvalue";
`));
   string s = i"foo barvalue";
  1. Now the string literal parser and interpolator run again which gives us:
     string s = __d_interpolatedString("foo barvalue");

Now if you wanted the other behavior where the $(bar) expression is expanded later, then you would escape the $ with $$:

mixin(iq{
   string s = i"foo $$(bar)";
});

Will become:

   string s = i"foo $(bar)";

Which becomes:

     string s = __d_interpolatedString("foo ", bar);

So by default, any $(...) expressions are always expanded by the initial interpolation, and if you want to delay expansion then you can escape the dollar. The interpolation parser doesn't try to understand the characters in between the '$' expressions and escape quoted sections etc... that just ends up causing more confusion and will cause the developer headaches when they try to figure out how to get what they want from their interpolated strings.

Copy link
Contributor

@marler8997 marler8997 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, double quoted strings have the same issue. Your example doesn't only apply to token strings:

mixin("
   string s = i\"foo $(bar)\";
");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think it is trivial with double-quoted strings because it is all syntax highlighted as a big string :) with token strings, the D lexer runs inside too so you might think it is different.

But that said, your reasoning is solid and simple to understand, so I could go with that too. (just gotta be sure we mention all these things - and explicitly mention what we decide to exclude - to get past the Walter barrier :) )

And also I guess mixin should probably be able to handle the interpolated thing too. The alias this could handle that, so could an explicit call to toString (or text), but we should prolly document it regardless.

PS this is a kinda weird place to be having this conversation, as comments on a random line of example code :P but whatever.

Copy link
Contributor

@marler8997 marler8997 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using alias this with toString would mean that using mixins with interpolated strings would require phobos.

enum SomeValue = 1234;
mixin(iq{enum AnotherValue = $(SomeValue + 1);});

Seems wrong to me....I'll have to think about this one.



## Description

Lexer Change:

Current:

```
Token:
...
StringLiteral
...
```
New:

```
Token:
...
StringLiteral
i StringLiteral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to have an interpolated WysiwygString, it isn't WYSIWYG. You also can't have escapes so you can't have a $ character.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way the DIP should specify what happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literal itself is still WYSIWYG. It's just that afterwards, it's processed by the interpolator. You can think of the interpolator like an operator and/or function call. For example, if you passed a WYSIWYG string to the "toUpper" function, the string wouldn't by WYSIWYG.

...
```

No change to grammar. Implementation consists of a small change to `lex.d` to detect when string literals are prefixed with the `i` character. It adds a boolean flag to string literals to keep track of which ones are "interpolated". Then in the parse stage, if a string literal is marked as "interpolated", it lowers it to a sequence of strings and expressions.

Implementation and tests can be found here: https://github.com/dlang/dmd/pull/7988/files

## Language Feature vs Library Feature

It has been brought up that this could be done as a library. Here is a breakdown of the pros and cons of a library implementation as opposed to a language implementation:

:white_check_mark: Library Pros:
- Requires no language changes

:x: Library Cons:
- Awkward syntax
- Bad performance
- Depends on a library for a trivial feature
- Cannot be used with betterC


:white_check_mark: Language Pros:
- High performance
- Nice syntax
- Better integration (IDEs, syntax highlighting, autocompletion)

:x: Language Cons:
- requires a language change

## Breaking Changes and Deprecations
None :smile:

## Copyright & License

Copyright (c) 2018 by the D Language Foundation

Licensed under [Creative Commons Zero 1.0](https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt)

## Reviews

The DIP Manager will supplement this section with a summary of each review stage
of the DIP process beyond the Draft Review.