From e22591c0fd0865c354a0b401192e10317dac5c72 Mon Sep 17 00:00:00 2001 From: Huon Wilson Date: Fri, 27 Jun 2014 18:39:13 +1000 Subject: [PATCH 1/4] `ManuallyDrop` type gives precise control of dtors of inline data. --- active/0000-manually-drop.md | 194 +++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 active/0000-manually-drop.md diff --git a/active/0000-manually-drop.md b/active/0000-manually-drop.md new file mode 100644 index 00000000000..fa4efcceb3f --- /dev/null +++ b/active/0000-manually-drop.md @@ -0,0 +1,194 @@ +- Start Date: 2014-06-27 +- RFC PR #: (leave this empty) +- Rust Issue #: (leave this empty) + +# Summary + +Provide a `ManualDrop` type that stores a `T` inline in memory, but +doesn't run `T`s destructor by default. + +# Motivation + +There is currently no long-term way to have partially (un)initialised +inline data in types, since internal fields always have their own +destructor run. This is effectively generalising the behaviour of +`*const` and `*mut` (which do not run destructors on their contents +automatically) to inline data, i.e. not requiring allocation or other +indirection like the pointer types. + +The motivating example of this is a `SmallVec` type +(e.g. [those defined by servo][servosmallvec]), where vector instances +with a small number of elements are stored directly inline, not on the +heap. Something like + +```rust +struct SmallVec { + length: uint, + capacity: uint, + pointer: *mut T, + inline: [T, .. 8] +} +``` + +[servosmallvec]: https://github.com/servo/servo/blob/bc9127c499793177bd3826dd3c1c77ff294cede3/src/components/util/smallvec.rs + +As an example of its behaviour consider: + +```rust +let mut v = SmallVec::new(); +// v.length == v.capacity == 0 +// v.pointer == NULL +// v.inline = [uninit, uninit, ... ] + + +v.push("foo".to_string()); +// v.length == 1, v.capacity == 0 (or something) +// v.pointer == NULL +// v.inline = ["foo", uninit, ... ] + +for _ in range(0, 99) { + v.push("bar".to_string()); +} +// v.length == 100, v.capacity >= 100 +// v.pointer == some allocation containing "foo", "bar", "bar", ... +// v.inline = [uninit, uninit, ... ] +``` + +When a `SmallVec` with the above definition is dropped, the +destructors of all 8 `T`s in `inline` will always be run (no matter +what happens in the `Drop` implementation of `SmallVec`), leading to +dramatic unsafety: in the first and last cases, destructors would be +running on uninitialised data, and in the second, it would correctly +destroy the first element (`"foo"`) but the last 7 are uninitialised. + +With the `ManuallyDrop` type given in this RFC, `SmallVec` could be +defined as + +```rust +// (sketch, without `unsafe {}`s/casts etc. for clarity) + +struct SmallVec { + length: uint, + capacity: uint, + pointer: *mut T, + + // now these 8 T's are not automatically destroyed when a SmallVec + // goes out of scope. + inline: ManuallyDrop<[T, .. 8]> +} + +impl Drop for SmallVec { + fn drop(&mut self) { + if !self.pointer.is_null() { // heap allocated + // same as Vec's drop + for i in range(0, self.length) { + ptr::read(self.pointer.offset(i)); + } + alloc::heap::deallocate(self.pointer, ...); + } else { + // run destructors on the first `self.length` elements of + // `self.inline`, but no others. + for i in range(0, self.length) { + ptr::read(&self.inline.get().offset(i)); + } + } + } +} +``` + +This definition is now safe: destructors run on exactly those elements +that are valid and no others. + + +To be precise, at the time of writing, Rust defines that destructors +are safe to run on zeroed values (and guarantees it via the so called +"drop flag"), meaning this can currently be made safe via: + +```rust +impl Drop for SmallVec { + fn drop(&mut self) { + // same as above. + + // zero the whole array: it's fine for the `inner` field to be + // redestroyed now. + mem::set_memory(&mut self.inner as *mut T, 0, self.inner.len()); + } +} +``` + +However, the drop flag is likely to disappear: +[#5016](https://github.com/mozilla/rust/issues/5016). + +# Detailed design + +A new lang-item type equivalent to + +```rust +pub struct ManuallyDrop { + pub data: T +} +``` + +would be provided, and the compiler would know to *not* run the +destructor of the `T` when a `ManuallyDrop` instance goes out of +scope. That is, + +```rust +struct X { num: int }; +impl Drop for X { + fn drop(&mut self) { println!("{}", self.num) } +} + +fn main() { + let normal = X { x: 0 }; + let new = ManuallyDrop { data: X { x: 1 } }; +} +``` + +would print only `0`, *not* `1`. + +This would be modelled after the `UnsafeCell` type, providing similar +methods: + +- `fn get(&self) -> *const T` +- `fn get_mut(&mut self) -> *mut T` +- `unsafe fn unwrap(self) -> T` + +The `data` field is public to allow for static initialisation (again, +similar to `UnsafeCell`). + +# Drawbacks + +This is quite dangerous. + +# Alternatives + +- This may be adequately modeled by enums in many situations, but I + can't imagine it will be easy to wrangle `enum`s to make `SmallVec` + work nicely (one variant for each possible on-stack arity? How do + you avoid adding an (unnecessary) additional work for the + discriminant?), and especially not if we ever get generic number + parameters, so something like `struct SmallVec { ... inner: ManuallyDrop<[T, .. n]> }` + would work, allowing for `SmallVec`, `SmallVec` etc. + +- Continue relying on the drop flag/zeroing. + +- A struct `UninterpretedBytesOfSize` equal to + `[u8, .. size_of::()]`, that is, a chunk of memory large enough + to store a `T`, but treated as raw memory (i.e. `u8`s). This has the + (large) downside of loosing all type information, inferring with the + compiler's reachability analysis (e.g. for `UnsafeCell`), and making + it easier for the programmer to make mistakes w.r.t. an incorrect or + forgotten coercion (it's would be identical to C's `void*`). + +# Unresolved questions + +- Should a `ManuallyDrop` always be `Copy`? It no longer has a + destructor and so the only risk of double freeing (etc.) would be + when the user writes such a thing. This would allow it to be a + maximally flexible building-block, but I cannot think of a specific + use-case for `ManuallyDrop` to be `Copy`. Being `Copy` + would make `ManuallyDrop` entirely the inline equivalent of `*const` + and `*mut`, since they are both `Copy` always. + +- The name. From 287976652aa1a889f931f22744bbec7f24f7669f Mon Sep 17 00:00:00 2001 From: Huon Wilson Date: Wed, 13 Aug 2014 22:43:52 +1000 Subject: [PATCH 2/4] Correct speeling. --- active/0000-manually-drop.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/active/0000-manually-drop.md b/active/0000-manually-drop.md index fa4efcceb3f..cd3d2ecb2db 100644 --- a/active/0000-manually-drop.md +++ b/active/0000-manually-drop.md @@ -176,7 +176,7 @@ This is quite dangerous. - A struct `UninterpretedBytesOfSize` equal to `[u8, .. size_of::()]`, that is, a chunk of memory large enough to store a `T`, but treated as raw memory (i.e. `u8`s). This has the - (large) downside of loosing all type information, inferring with the + (large) downside of losing all type information, interfering with the compiler's reachability analysis (e.g. for `UnsafeCell`), and making it easier for the programmer to make mistakes w.r.t. an incorrect or forgotten coercion (it's would be identical to C's `void*`). From 7b02f8007eb82cc3fe3fe51cb99da74f2f74ec3c Mon Sep 17 00:00:00 2001 From: Huon Wilson Date: Fri, 19 Jun 2015 10:25:15 -0700 Subject: [PATCH 3/4] Update to reflect Rust changes. --- active/0000-manually-drop.md | 108 ++++++++++++++++++++++++++++------- 1 file changed, 87 insertions(+), 21 deletions(-) diff --git a/active/0000-manually-drop.md b/active/0000-manually-drop.md index cd3d2ecb2db..1b3face1e3c 100644 --- a/active/0000-manually-drop.md +++ b/active/0000-manually-drop.md @@ -17,10 +17,13 @@ automatically) to inline data, i.e. not requiring allocation or other indirection like the pointer types. The motivating example of this is a `SmallVec` type -(e.g. [those defined by servo][servosmallvec]), where vector instances +(e.g. servo's [smallvec], or @bluss's [arrayvec]), where vector instances with a small number of elements are stored directly inline, not on the heap. Something like +[smallvec]: https://github.com/servo/rust-smallvec +[arrayvec]: https://github.com/bluss/arrayvec + ```rust struct SmallVec { length: uint, @@ -30,8 +33,6 @@ struct SmallVec { } ``` -[servosmallvec]: https://github.com/servo/servo/blob/bc9127c499793177bd3826dd3c1c77ff294cede3/src/components/util/smallvec.rs - As an example of its behaviour consider: ```rust @@ -101,7 +102,7 @@ that are valid and no others. To be precise, at the time of writing, Rust defines that destructors -are safe to run on zeroed values (and guarantees it via the so called +are safe to run on destroyed values (and guarantees it via the so called "drop flag"), meaning this can currently be made safe via: ```rust @@ -109,15 +110,20 @@ impl Drop for SmallVec { fn drop(&mut self) { // same as above. - // zero the whole array: it's fine for the `inner` field to be - // redestroyed now. - mem::set_memory(&mut self.inner as *mut T, 0, self.inner.len()); + // flag the whole array as dropped: it's fine for the `inner` + // field to be redestroyed now. + ptr::write_bytes(&mut self.inner as *mut T, mem::POST_DROP_U8, self.inner.len()); } } ``` -However, the drop flag is likely to disappear: -[#5016](https://github.com/mozilla/rust/issues/5016). +However, the drop flag is likely to disappear: [#5016]. + +[#5016]: https://github.com/mozilla/rust/issues/5016 + +NB. not running destructors is now safe in Rust, so this type isn't +inherently `unsafe`, except for the fact that it is designed to store +invalid instances of other types. # Detailed design @@ -125,7 +131,7 @@ A new lang-item type equivalent to ```rust pub struct ManuallyDrop { - pub data: T + data: T } ``` @@ -150,16 +156,34 @@ would print only `0`, *not* `1`. This would be modelled after the `UnsafeCell` type, providing similar methods: +- `const fn new(x: T) -> ManuallyDrop` - `fn get(&self) -> *const T` - `fn get_mut(&mut self) -> *mut T` -- `unsafe fn unwrap(self) -> T` +- `unsafe fn into_inner(self) -> T` + +Additional initialisation functions can be made available: + +- `fn zeroed()` +- `fn uninitialized()` -The `data` field is public to allow for static initialisation (again, -similar to `UnsafeCell`). +(These would be `const fn` if possible, but it is not clear to me that +it is right now.) -# Drawbacks +The `ManuallyDrop` type would be a black box for representation +optimisations: it is explicitly designed to be able to store arbitrary +junk, and so the assumptions made by the compiler conventionally may +not hold (for example, the `Option` null pointer optimisation [will +break] if a pointer is null). As a concrete example: -This is quite dangerous. +[will break]: https://github.com/servo/rust-smallvec/issues/5 + +```rust +type T = Option>; +type T_MD = Option>>; + +assert_eq!(size_of::(), 8); +assert_eq!(size_of::(), 16); +``` # Alternatives @@ -171,15 +195,57 @@ This is quite dangerous. parameters, so something like `struct SmallVec { ... inner: ManuallyDrop<[T, .. n]> }` would work, allowing for `SmallVec`, `SmallVec` etc. -- Continue relying on the drop flag/zeroing. - - A struct `UninterpretedBytesOfSize` equal to `[u8, .. size_of::()]`, that is, a chunk of memory large enough to store a `T`, but treated as raw memory (i.e. `u8`s). This has the - (large) downside of losing all type information, interfering with the - compiler's reachability analysis (e.g. for `UnsafeCell`), and making - it easier for the programmer to make mistakes w.r.t. an incorrect or - forgotten coercion (it's would be identical to C's `void*`). + (large) downside of losing all type information, interfering with + the compiler's reachability analysis (e.g. for `UnsafeCell`), and + making it easier for the programmer to make mistakes w.r.t. an + incorrect or forgotten coercion (it's would be identical to C's + `void*`). This is getting more feasible with [`const fn`] support. + +- Change drop to have (semantically) take full ownership of its + contents, so that `mem::forget` works, e.g. `trait Drop { fn + drop(self); }` or a design like that in [@eddyb's comment]. + +- Make no change and just perform manual ownership control via + `Option`: a data type can store `Option` instead of `T` with + the invariant that the `Option` is always `Some` except for when the + destructor runs. This allows one to implement the previous + alternative with `take`. E.g. `ManuallyDrop` can be shimmed as + something like: + + ```rust + struct ManuallyDrop { x: Option } + impl ManuallyDrop { + const fn new(x: T) -> ManuallyDrop { + ManuallyDrop { x: Some(x) } + } + fn get(&self) -> *const T { + self.x.as_ref().unwrap() + } + fn get_mut(&mut self) -> *mut T { + self.x.as_ref_mut().unwrap() + } + unsafe fn into_inner(self) -> T { + self.x.unwrap() + } + } + + impl Drop for ManuallyDrop { + fn drop(&mut self) { mem::forget(self.x.take()) } + } + ``` + + This approach is used by [arrayvec] but has many downsides: + + - (much) larger compiled code when not using `unsafe` (`unwrap` + includes branches and unwinding etc.) + - more invariants to track + - larger data types + +[`const fn`]: https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md +[@eddyb's comment]: https://github.com/rust-lang/rfcs/pull/197#issuecomment-110850383 # Unresolved questions From 386bf27328ae224bb16eccea80e348284c7ec82b Mon Sep 17 00:00:00 2001 From: Huon Wilson Date: Fri, 26 Jun 2015 16:27:51 -0700 Subject: [PATCH 4/4] Address @bluss's notes. --- text/0000-manually-drop.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/text/0000-manually-drop.md b/text/0000-manually-drop.md index 873f8e3c5fa..4490b434f41 100644 --- a/text/0000-manually-drop.md +++ b/text/0000-manually-drop.md @@ -203,16 +203,19 @@ assert_eq!(size_of::(), 16); the compiler's reachability analysis (e.g. for `UnsafeCell`), and making it easier for the programmer to make mistakes w.r.t. an incorrect or forgotten coercion (it's would be identical to C's - `void*`). This is getting more feasible with [`const fn`] support. + `void*`). This is getting more feasible with [`const fn`] + support. There needs to be support for explicitly specifying the + alignment of the type too (to match `T`). - Change drop to have (semantically) take full ownership of its contents, so that `mem::forget` works, e.g. `trait Drop { fn drop(self); }` or a design like that in [@eddyb's comment]. - Make no change and just perform manual ownership control via - `Option`: a data type can store `Option` instead of `T` with - the invariant that the `Option` is always `Some` except for when the - destructor runs. This allows one to implement the previous + `Option` (or something similar, designed to avoid invalid layout + optimisations): a data type can store `Option` instead of `T` + with the invariant that the `Option` is always `Some` except for + when the destructor runs. This allows one to implement the previous alternative with `take`. E.g. `ManuallyDrop` can be shimmed as something like: