This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| hello [2021-05-25] – dcai | hello [2024-05-07] (current) – dcai | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | [[blog/ | + | ==== Hello ==== |
| - | + | ||
| - | + | ||
| - | ---- | + | |
| - | + | ||
| - | |{{/ | + | |
| - | |{{/ | + | |
| - | |{{/ | + | |
| - | + | ||
| - | \\ | + | |
| - | + | ||
| - | + | ||
| - | ---- | + | |
| - | + | ||
| - | + | ||
| - | Over the past several weeks I have been attempting to reimplement the API of an existing python library as a wrapper for an equivalent library in Rust. | + | |
| - | + | ||
| - | tl;dr: this ended up being much harder than I expected it to be, partly because of important differences in the behaviour of the two languages, and partly because of the (self-imposed) obligation to match an existing (idiomatic) python API. | + | |
| - | + | ||
| - | ===== Motivation | + | |
| - | + | ||
| - | Python is the traditional language of choice for font tools. Popular font editors generally support extensions written in python, and type designers and foundries frequently have extensive collections of scripts and tools for doing font QA, producing proofs, and generating compiled font files. | + | |
| - | + | ||
| - | As we [[https:// | + | |
| - | + | ||
| - | ===== Language differences ===== | + | |
| - | + | ||
| - | The main challenge faced with this project is working around the fundamental differences between Rust and python, particularly around ownership and mutability. A [[https:// | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | font = Font.open(" | + | |
| - | glyphA = font.layers.defaultLayer[" | + | |
| - | point = glyphA.contours[0].points[0] | + | |
| - | point.x = 404; | + | |
| - | assert point.x == glyphA.contours[0].points[0].x | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | More succinctly: python expects you to share references to things. When you create a binding to a point in a contour, that binding refers to the same data as the original point, and modifying the binding modifies the collection. | + | |
| - | + | ||
| - | This doesn’t really translate to Rust: Rust is much more restrictive about handing out references. | + | |
| - | + | ||
| - | ==== Interior mutability ==== | + | |
| - | + | ||
| - | My initial plan was to just make extensive use of interior mutability, which is a pattern available in Rust for dealing with these sorts of situations. This would require each of the Rust types I would like to expose to python to be behind a shared pointer, with some mechanism for ensuring that access is unique at any given time. | + | |
| - | + | ||
| - | This means converting from something that looks like this, | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | struct Font { | + | |
| - | layers: Map< | + | |
| - | } | + | |
| - | + | ||
| - | struct Layer { | + | |
| - | glyphs: Map< | + | |
| - | } | + | |
| - | + | ||
| - | struct Glyph { | + | |
| - | contours: Vec< | + | |
| - | components: Vec< | + | |
| - | } | + | |
| - | + | ||
| - | struct Contour { | + | |
| - | points: Vec< | + | |
| - | } | + | |
| - | + | ||
| - | struct Component { | + | |
| - | base_glyph: String, | + | |
| - | transform: AffineTransformation, | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | To something that looks like this: | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | struct SharedVec< | + | |
| - | struct SharedMap< | + | |
| - | + | ||
| - | struct Font { | + | |
| - | layers: SharedMap< | + | |
| - | } | + | |
| - | + | ||
| - | struct Layer { | + | |
| - | glyphs: SharedMap< | + | |
| - | } | + | |
| - | + | ||
| - | struct Glyph { | + | |
| - | contours: SharedVec< | + | |
| - | components: SharedVec< | + | |
| - | } | + | |
| - | + | ||
| - | // etc. | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | This was my initial approach, but it started to become pretty verbose, pretty quickly. In hindsight it may have ultimately been simpler than where I //did// end up, but, well, that’s hindsight. | + | |
| - | + | ||
| - | ==== Proxy objects ==== | + | |
| - | + | ||
| - | Ultimately, I settled on a different approach. Instead of having actual shared objects that are actually mutated, you have a ‘proxy object’. This is a reference to a single shared object (in this case, the Font object) and then a mechanism (something like a [[https:// | + | |
| - | + | ||
| - | In this world, our Layer object looks more like this: | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | struct FontProxy(Arc< | + | |
| - | + | ||
| - | struct LayerProxy { | + | |
| - | font: FontProxy, | + | |
| - | layer_name: String, | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | And then we just need some way of retrieving the inner layer object from the font as needed. | + | |
| - | + | ||
| - | Ideally this would look something like this: | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | impl LayerProxy { | + | |
| - | fn get(& | + | |
| - | self.font.0.lock().unwrap().layers.get(& | + | |
| - | } | + | |
| - | + | ||
| - | fn get_mut(& | + | |
| - | self.font.0.lock().unwrap().layers.get_mut(& | + | |
| - | } | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | But this doesn’t quite work, for two reasons. First, because each access of the object represented by the proxy requires acquiring a lock on the underlying font object, we can’t just return a reference. The lock is only held for the scope of the function, so the minute we return we lose the lock, and our reference would be invalid. Instead, we need to do whatever work is required inside this function, which we can do easily enough by passing a closure that takes a '' | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | impl LayerProxy { | + | |
| - | fn with< | + | |
| - | self.font.0.lock().unwrap().layers.get(& | + | |
| - | .map(f) | + | |
| - | .ok_or_else(|| ProxyError:: | + | |
| - | } | + | |
| - | + | ||
| - | fn with_mut< | + | |
| - | self.font.0.lock().unwrap().layers.get_mut(& | + | |
| - | .map(f) | + | |
| - | .ok_or_else(|| ProxyError:: | + | |
| - | } | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | With this in place, we can implement our API on top of the proxy object: | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | impl LayerProxy { | + | |
| - | fn len(& | + | |
| - | self.with(|layer| layer.len()) | + | |
| - | } | + | |
| - | + | ||
| - | fn remove_glyph(& | + | |
| - | self.with_mut(|layer| layer.remove_glyph(name)) | + | |
| - | } | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | A nice property of these proxy objects is that they can be implemented in terms of each other. Just as a glyph is contained by a layer, a '' | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | struct GlyphProxy { | + | |
| - | layer: LayerProxy, | + | |
| - | glyph_name: String, | + | |
| - | } | + | |
| - | + | ||
| - | impl GlyphProxy { | + | |
| - | fn with< | + | |
| - | self.layer.with(|layer| { | + | |
| - | layer.get(& | + | |
| - | .map(f) | + | |
| - | .ok_or_else(|| ProxyError:: | + | |
| - | })? | + | |
| - | } | + | |
| - | + | ||
| - | fn with_mut< | + | |
| - | self.layer.with_mut(|layer| { | + | |
| - | layer.get_mut(& | + | |
| - | .map(f) | + | |
| - | .ok_or_else(|| ProxyError:: | + | |
| - | })? | + | |
| - | } | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | This… mostly works? but it gets complicated shortly, when we start to deal with lists. | + | |
| - | + | ||
| - | The next object we want to deal with isn’t a single object, but rather the list of contours in a glyph. This is easy enough; we can just reuse the '' | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | struct GlyphContoursProxy(GlyphProxy); | + | |
| - | + | ||
| - | impl GlyphContoursProxy { | + | |
| - | fn with< | + | |
| - | self.0.with(|glyph| f(& | + | |
| - | } | + | |
| - | + | ||
| - | fn with_mut< | + | |
| - | self.0.with_mut(|glyph| f(&mut glyph.contours)) | + | |
| - | } | + | |
| - | } | + | |
| - | + | ||
| - | Struct ContourProxy { | + | |
| - | contours: GlyphContoursProxy, | + | |
| - | idx: usize, | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | And… things start to get a bit tricky here. | + | |
| - | + | ||
| - | Consider the following code: | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | glyph = myFont[" | + | |
| - | contour = glyph.contours[0] | + | |
| - | glyph.contours.insert(0, | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | If we’re using indexes to identify our objects, we now have a problem: the contour at index '' | + | |
| - | + | ||
| - | ==== Headaches ==== | + | |
| - | + | ||
| - | This issue with ‘proxy validity’ was one of several annoying and slightly subtle issues I ran into during this project. They were all more-or-less addressable, | + | |
| - | + | ||
| - | Some of the more interesting complications: | + | |
| - | + | ||
| - | === This issue with index validity === | + | |
| - | + | ||
| - | In this particular case, the solution is to augment all of our types with an additional identifier; this is just a token that uniquely identifies a particular object. When we create a proxy, we copy over this token, and then when we access the object, we check to make sure that the tokens match and return a '' | + | |
| - | + | ||
| - | === Not all objects are proxy objects === | + | |
| - | + | ||
| - | This proxy object approach works fine if you’re just loading a font and manipulating it, but what if you’re creating new objects? It is totally reasonable to have python code that looks something like: | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | glyphA = Glyph(" | + | |
| - | glyphB = Glyph(" | + | |
| - | layer = Layer(glyphs=[glyphA, | + | |
| - | font.addLayer(" | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | In this code, neither the glyphs nor the layer can be a proxy object when they’re initialized, | + | |
| - | + | ||
| - | This means that our code for '' | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | enum GlyphInner { | + | |
| - | Layer { layer: LayerProxy, name: String }, | + | |
| - | Concrete(Arc< | + | |
| - | } | + | |
| - | + | ||
| - | struct GlyphProxy(GlyphInner); | + | |
| - | + | ||
| - | impl GlyphProxy { | + | |
| - | fn with< | + | |
| - | match & | + | |
| - | GlyphInner:: | + | |
| - | layer | + | |
| - | .get(& | + | |
| - | .map(f) | + | |
| - | .ok_or_else(|| ProxyError:: | + | |
| - | } | + | |
| - | GlyphInner:: | + | |
| - | } | + | |
| - | } | + | |
| - | // etc | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | Fortunately, | + | |
| - | + | ||
| - | === Being pythonic is tricky. === | + | |
| - | + | ||
| - | A fundamental goal of this project was matching the existing API, to the point where the main development goal was trying to pass the existing test suite, with only minimal modifications (for instance giving up on tests that required object identity, which doesn’t work with proxy objects.) | + | |
| - | + | ||
| - | [[https:// | + | |
| - | + | ||
| - | === Collections are hard === | + | |
| - | + | ||
| - | Lets say you have a simple type in rust, that contains a '' | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | # | + | |
| - | struct Point { | + | |
| - | x: i32, | + | |
| - | y: i32, | + | |
| - | } | + | |
| - | + | ||
| - | # | + | |
| - | struct Thing { | + | |
| - | items: Vec< | + | |
| - | } | + | |
| - | + | ||
| - | # | + | |
| - | impl Thing { | + | |
| - | #[new] | + | |
| - | fn new(items: Option< | + | |
| - | SubThing { | + | |
| - | items: items.unwrap_or_default(), | + | |
| - | } | + | |
| - | } | + | |
| - | #[getter] | + | |
| - | fn get_items(& | + | |
| - | self.items.clone() | + | |
| - | } | + | |
| - | + | ||
| - | #[setter] | + | |
| - | fn set_items(& | + | |
| - | self.items = items; | + | |
| - | } | + | |
| - | } | + | |
| - | + | ||
| - | # | + | |
| - | impl Point { | + | |
| - | #[new] | + | |
| - | fn new(x: i32, y: i32) -> Self { | + | |
| - | Point { | + | |
| - | x, y | + | |
| - | } | + | |
| - | } | + | |
| - | + | ||
| - | #[getter] | + | |
| - | fn get_x(& | + | |
| - | self.x | + | |
| - | } | + | |
| - | + | ||
| - | #[setter] | + | |
| - | fn set_x(& | + | |
| - | self.x = val; | + | |
| - | } | + | |
| - | + | ||
| - | #[getter] | + | |
| - | fn get_y(& | + | |
| - | self.y | + | |
| - | } | + | |
| - | + | ||
| - | #[setter] | + | |
| - | fn set_y(& | + | |
| - | self.y = val; | + | |
| - | } | + | |
| - | } | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | This looks nice: [[https:// | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | thing = Thing([Point(42, | + | |
| - | assert thing.items[0].x = 42 | + | |
| - | thing.items = [Point(0, 0), Point(1, 1)] | + | |
| - | assert thing.items[-1].y == 1 | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | The nice thing here is that we use a '' | + | |
| - | + | ||
| - | Unfortunately, | + | |
| - | + | ||
| - | <code highlight> | + | |
| - | thing = Thing[Point(42, | + | |
| - | thing.items[0].x += 5 | + | |
| - | assert thing.items[0].x == 47 # fails! | + | |
| - | </ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | The problem is that '' | + | |
| - | + | ||
| - | Unless I’m missing something, it doesn’t feel like there’s a great solution to this, unless you use proxy objects of some kind to represent the collection. | + | |
| - | + | ||
| - | === Value versus reference semantics, more generally === | + | |
| - | + | ||
| - | Collections are an illustration of a bigger issue, which is around value versus reference semantics. I think this is probably the most important thing to consider when designing a python API on top of rust: when do you want things to behave like values (where creating a new binding copies the object) and when do you want things to act like references (where new bindings reference the same underlying object.) | + | |
| - | + | ||
| - | The semantic mismatch between the two languages really encourages value types. Reference types are a pain, but they are important for things like collections. In some cases you can avoid exposing collections altogether, and just provide methods like '' | + | |
| - | + | ||
| - | ===== Learnings ===== | + | |
| - | + | ||
| - | Ultimately, the thing I was trying to achieve (fully reimplement an existing idiomatic python library on top of an existing rust library) is not something very many people should be attempting. Most people who are trying to use rust from python have a more specific goal: speeding up some particular piece of code, for instance. | + | |
| - | + | ||
| - | Getting this working was annoying, and I’m not very happy with the result. I haven’t written much python in the past five years or so, and if I were more comfortable there I think I would probably have made certain better choices, and that might have made things easier; but probably only marginally easier. | + | |
| - | + | ||
| - | My main conclusion is pretty straightforward. If you wish to expose a python API from rust, you should think carefully about the design of that API ahead of time. Some good questions to ask: | + | |
| - | + | ||
| - | * //How much API do I need to expose?// The less API you need to write, the easier your life will be. | + | |
| - | * //How much does my API need to use python collections?// | + | |
| - | * //Can I limit the depth of my object graph?// If you have an object that contains a list of other objects, and those inner objects also have child objects, then you will need interior mutability or a proxy type at each of those levels. If you have an object with fields and the fields are value types, things are much easier. | + | |
| - | * //What should be in python, and what should be in rust, and what should the contact points be//? I have spent ~5 years writing python and I have spent ~5 years writing rust. I like rust a lot! But I do not think it should be controversial to say that //python is the better language for **writing python**//. Where possible, you should limit the use of rust to those // | + | |
| - | + | ||
| - | ===== Finally ===== | + | |
| - | + | ||
| - | This was definitely a mixed experience. on the positive side, it is extremely easy and ergonomic to write a python module in Rust. On the downside, it is much harder than I had expected to expose an interface that felt truly at home in python. | + | |
| - | + | ||
| - | ==== Thanks ==== | + | |
| - | + | ||
| - | I found this work frustrating enough that when I finally had (mostly) finished this writeup, I was most eager to just forget about it and move on to something else; perhaps an example of the general phenomenon of [[https:// | + | |
| - | + | ||
| - | + | ||
| - | \\ | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | ---- | + | |
| + | Creating this page to test neovim integration. | ||
| + | * tested script to upload | ||
| + | * tested apikey auth | ||
| + | * tested echo message | ||
| + | * added publish in neovim | ||
| + | * prevent non-markdown to be published | ||