Skip to content

Conversation

@bschoenmaeckers
Copy link
Member

No description provided.

@bschoenmaeckers bschoenmaeckers force-pushed the PyUnicodeWriter branch 3 times, most recently from f0df624 to eb46ca2 Compare June 19, 2025 13:09
@bschoenmaeckers bschoenmaeckers force-pushed the PyUnicodeWriter branch 7 times, most recently from 0897456 to 203dec2 Compare June 20, 2025 07:16
@bschoenmaeckers bschoenmaeckers force-pushed the PyUnicodeWriter branch 3 times, most recently from aea618c to c28a9e4 Compare July 14, 2025 08:34
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to come to this. This API looks pretty tempting as a way to squeeze some performance out of e.g. error pathways where we need to format Python strings from error values.

It would be interesting to see benchmarks compared to the existing pattern of formatting to string and then copying into a new Python string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice to write compat logic using private APIs of older Python versions makes me a little uneasy. Is the performance advance so significant that we can justify it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the compat layer because it is faster to do it in rust. See #5199 (comment)

src/fmt.rs Outdated
Comment on lines 84 to 119
#[cfg(not(any(Py_LIMITED_API, PyPy)))]
impl fmt::Write for PyUnicodeWriter {
fn write_str(&mut self, s: &str) -> fmt::Result {
let result = unsafe {
PyUnicodeWriter_WriteUTF8(self.as_ptr(), s.as_ptr().cast(), s.len() as isize)
};
if result < 0 {
self.set_error();
Err(fmt::Error)
} else {
Ok(())
}
}

fn write_char(&mut self, c: char) -> fmt::Result {
let result = unsafe { PyUnicodeWriter_WriteChar(self.as_ptr(), c as u32) };
if result < 0 {
self.set_error();
Err(fmt::Error)
} else {
Ok(())
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider any buffering on the Rust side to avoid making hotly repeated FFI calls? Possibly a follow-up.

src/fmt.rs Outdated
Comment on lines 22 to 38
/// This is like the `format!` macro, but it returns a `PyString` instead of a `String`.
#[macro_export]
macro_rules! py_format {
($py: expr, $($arg:tt)*) => {
$crate::types::PyString::from_fmt($py, format_args!($($arg)*))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to decide if we need this API if we already have PyString::from_fmt. I like the macro conciseness, though in the past we've had pushback against adding too many macros.

Interesting thing to note here is that format_args! has an as_str() const fn, so I wonder if there's a crazy way that we can make this automatically intern! the value if the format string is a constant 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having troubles merging the py_format! macro and the intern! macro. Currently the intern! macro returns a &'static Bound<'py, PyString>, but py_format! has to return a dynamic lifetime reference. I am not able to do this without leaking (static) variables into the users scope.

Do you have any smart ideas the solve this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I need something like a Python Cow 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could always just return Bound<'py, PyString> from py_format!? At least in the interned case it would be returning the existing string, just increasing the reference count.

src/fmt.rs Outdated
Comment on lines 30 to 47
#[cfg(not(any(Py_LIMITED_API, PyPy)))]
/// The `PyUnicodeWriter` is a utility for efficiently constructing Python strings
pub struct PyUnicodeWriter {
writer: NonNull<ffi::PyUnicodeWriter>,
last_error: Option<PyErr>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to having this as a public type rather than an implementation detail to PyString::from_fmt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage is that the user can use the fmt::Write trait to write into a PyString, using functions that accept a generic fmt::Write impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An other option is to hide the writer implementation using something like the following,

impl PyString {
    pub fn with_writer<F, W>(
        py: Python,
        f: F,
    ) -> Bound<PyString>
    where
        F: FnOnce(&impl fmt::Write);
}

Copy link
Member Author

@bschoenmaeckers bschoenmaeckers Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made PyUnicodeWriter private for now. So only PyString::from_fmt and py_format! are public.

@bschoenmaeckers bschoenmaeckers force-pushed the PyUnicodeWriter branch 2 times, most recently from 0dc648f to acbc13e Compare September 19, 2025 07:57
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 19, 2025

Merging this PR will not alter performance

✅ 98 untouched benchmarks
🆕 2 new benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 format_simple N/A 154.2 ns N/A
🆕 format_complex N/A 7.5 µs N/A

Comparing bschoenmaeckers:PyUnicodeWriter (0bdab1f) with main (49e8772)

Open in CodSpeed

@bschoenmaeckers bschoenmaeckers force-pushed the PyUnicodeWriter branch 3 times, most recently from 2cc5814 to e5c3008 Compare September 19, 2025 08:24
@bschoenmaeckers
Copy link
Member Author

I ran some benchmarks with interesting results.

simple case that can be optimised to a static string

Before: 3.7 µs (3700 ns)

let format_args = format_args!("Hello {}!", "world");
PyString::new(py, &format!("{format_args}"))

After: 219.2 ns

py_format!(py, "Hello {}!", "world")

For static string this is a lot faster.

complex case with variable formatting arguments

Before: 7 µs

let value = (black_box(42), black_box("foo"), [0; 0]);
let format_args = format_args!("This is some complex value: {value:?}");
PyString::new(py, &format!("{format_args}"))

After: 9 µs

let value = (black_box(42), black_box("foo"), [0; 0]);
py_format!(py, "This is some complex value: {value:?}")

This case of py_format! is slower when it uses the PyUnicodeWriter. Are these FFI calls that slow? Maybe we have to buffer in rust first?

@bschoenmaeckers
Copy link
Member Author

This case of py_format! is slower when it uses the PyUnicodeWriter. Are these FFI calls that slow? Maybe we have to buffer in rust first?

I now see what happens, it is using the compat version of PyUnicodeWriter_WriteUTF8. This creates a PyString internally so this is obvious not faster.

#[cfg(not(any(Py_LIMITED_API, PyPy)))]
compat_function!(
    originally_defined_for(all(Py_3_14, not(Py_LIMITED_API)));

    pub unsafe fn PyUnicodeWriter_WriteUTF8(writer: *mut crate::PyUnicodeWriter,str: *const std::os::raw::c_char, size: crate::Py_ssize_t) -> std::os::raw::c_int {
        let size = if size < 0 {
            libc::strlen(str) as isize
        } else {
            size
        };

        let py_str = crate::PyUnicode_FromStringAndSize(str, size);
        if py_str.is_null() {
            return -1;
        }

        let result = crate::_PyUnicodeWriter_WriteStr(writer.cast(), py_str);
        crate::Py_DECREF(py_str);
        result
    }
);

I don't think the compat version is that useful over buffering into a rust String. What do you think @davidhewitt?

@davidhewitt
Copy link
Member

I don't think the compat version is that useful over buffering into a rust String. What do you think @davidhewitt?

I completely agree, I think using the private CPython APIs to no advantage is unfortunate. So this means that on Python 3.13 and below we should just make the non-static macro expand to PyString::new(py, format!{...}) and do the writing in Rust, I guess?

What happens if you run the benchmark on Python 3.14? In my opinion that's the interesting question, if it's faster on 3.14 this feature seems justified, if it's always slower then it may not be worth it? 🤔

@bschoenmaeckers
Copy link
Member Author

I don't think the compat version is that useful over buffering into a rust String. What do you think @davidhewitt?

I completely agree, I think using the private CPython APIs to no advantage is unfortunate. So this means that on Python 3.13 and below we should just make the non-static macro expand to PyString::new(py, format!{...}) and do the writing in Rust, I guess?

What happens if you run the benchmark on Python 3.14? In my opinion that's the interesting question, if it's faster on 3.14 this feature seems justified, if it's always slower then it may not be worth it? 🤔

I think this is marginally faster. Around 1-2% faster. There may be bigger wins when converting larger strings or when we buffer in rust first.

@davidhewitt
Copy link
Member

I don't think the compat version is that useful over buffering into a rust String. What do you think @davidhewitt?

I completely agree, I think using the private CPython APIs to no advantage is unfortunate. So this means that on Python 3.13 and below we should just make the non-static macro expand to PyString::new(py, format!{...}) and do the writing in Rust, I guess?
What happens if you run the benchmark on Python 3.14? In my opinion that's the interesting question, if it's faster on 3.14 this feature seems justified, if it's always slower then it may not be worth it? 🤔

I think this is marginally faster. Around 1-2% faster. There may be bigger wins when converting larger strings or when we buffer in rust first.

Fair enough. I guess overall we're not changing the algorithmic complexity, just cutting the constant factor. I can also imagine that performance may be heavily dependent on whether the particular thing being formatted makes use of WriteChar or WriteUTF8. I could imagine a 512 byte Vec<u8> Rust buffer could go a long way towards avoiding cost of frequent WriteChar calls, for example.

Another appealing property of the proposal here is that it makes large string creation not peak at ~2x the actual memory (full Rust and Python strings allocated).

Overall I feel sympathetic to this proposal, it seems like it should be a good thing on Python 3.14+. On older Pythons it seems like we should just make the macro "do the dumb thing" as per #5199 (comment) (maybe with the static interning optimization still).

Intern static strings in `py_format!`


Add `'py` lifetime to `PyUnicodeWriter`

Add benchmark

Bench using new macro

Bench on python 3.14

Don't measure pystring dealloc

Remove PyUnicodeWriter compat shim

Add `TryInto` impl
@bschoenmaeckers bschoenmaeckers marked this pull request as ready for review January 17, 2026 12:29
@bschoenmaeckers
Copy link
Member Author

I did a final pass over this & rebased. So should be ready for review.

@bschoenmaeckers bschoenmaeckers changed the title Add PyString::from_fmt using new PyUnicodeWriter Add PyString::from_fmt & py_format! using new PyUnicodeWriter api Jan 17, 2026
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking good to me.

Is it worth having a pass over the .md files and also the code itself to find places where we can apply this? e.g. I spotted at least one use in coroutine.rs, I expect there must be more.

src/fmt.rs Outdated
Comment on lines 1 to 4
//! This module provides the `PyUnicodeWriter` struct, which is a utility for efficiently
//! constructing Python strings using Rust's `fmt::Write` trait.
//! It allows for incremental string construction, without the need for repeated allocations, and
//! is particularly useful for building strings in a performance-sensitive context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to make this module pub(crate) for now or remove this doc?

src/fmt.rs Outdated
std::ptr::NonNull,
};

/// This is like the `format!` macro, but it returns a `PyString` instead of a `String`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given macros can be a bit hard to figure out the API on, it would probably be nice to expand the docs to clarify that the arguments are exactly like format!, but with py as first argument.

We might want to comment about how if the string is optimized to a static string at compile time then this macro offers the advantage of string interning over the PyString::from_fmt API.

Probably want to add an example usage here too, e.g.

let py_string: Bound<'_, PyString> = py_format!(py, "{} {}", "hello", "world");

Comment on lines +27 to +33
static INTERNED: $crate::sync::PyOnceLock<$crate::Py<$crate::types::PyString>> = $crate::sync::PyOnceLock::new();
Ok(
INTERNED
.get_or_init($py, || $crate::types::PyString::intern($py, static_string).unbind())
.bind($py)
.to_owned()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work?

Suggested change
static INTERNED: $crate::sync::PyOnceLock<$crate::Py<$crate::types::PyString>> = $crate::sync::PyOnceLock::new();
Ok(
INTERNED
.get_or_init($py, || $crate::types::PyString::intern($py, static_string).unbind())
.bind($py)
.to_owned()
)
$crate::intern!($py, static_string).to_owned()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with attempt to use a non-constant value in a constant. Because static_string is not defined as a constant, and cannot be until our MSRV is 1.84.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants