prp: refactor #47

ee7 · 2024-02-16T11:44:41Z

A small refactor, in case we didn't need the C implementation here. But this is mainly to begin to discuss some Nim patterns for long-term code readability/maintainability in general.

If we really care about performance here, I admit that this PR is likely a performance regression. I believe the C implementation is likely to be lowered to SIMD instructions, but this Nim one isn't because the codegen for strings cannot be optimized as easily. See https://nim.godbolt.org/z/G8Ys4vPce (or here with --opt:size), and note that the Nim implementation with byte is similarly optimized.

I believe it'd be a little more idiomatic to prefer array[n, byte] or openArray[byte] or openArray[char] rather than using string. And the compiler can optimize that better. But openArray is admittedly awkward sometimes, especially given that view types are still marked as experimental.

Replace the xor_in_place C function with a pure Nim implementation, This avoids use of unsafe language features and improves clarity. In particular, a reader should not generally expect that calling a proc with a signature like

proc foo(bar: Bar) =

mutates the non-ref object bar via addr. That's what happened previously with:

proc runHmacPrf(ctx: PrpCtx, key: string) =

Of course, the reader can get it from the context, but it's still less clear.

At the time of writing, we only use this module's public procedures (prp and brb) in chalk, and only in attestation.nim on:

Also, make some other changes for readability:

Prefer let over var.
Prefer to initialize explicitly. This is expected to be required in a future version of Nim. Also note that default values are only set when initializing explicitly.
Prefer the object construction syntax.
Remove a redundant $ operation.
Prefer the least powerful construct: proc, rather than template. We can rely on the compiler to inline when appropriate. In my opinion, a better practice is: use template only when necessary, never use the inline pragma, and enable link-time optimization in release builds. And from the manual:

Style note: For code readability, it is best to use the least powerful programming construct that remains expressive. So the "check list" is:

Use an ordinary proc/iterator, if possible.

Else: Use a generic proc/iterator, if possible.

Else: Use a template, if possible.

Else: Use a macro.

In general, unnecessary template usage has the potential to:

Significantly increase compilation time.
Increase code size needlessly: a template removes the ability for the compiler to deliberately not inline something, and removes the user's ability to choose to optimize for code size (via --opt:size).

Replace the `xor_in_place` C function with a pure Nim implementation, This avoids use of unsafe language features and improves clarity. In particular, the proc signature proc runHmacPrf(ctx: PrpCtx, key: string) = was previously not self documenting: it mutated the (non-ref) object `PrpCtx` via `addr`. And make some other changes for readability: - Prefer `let` over `var`. - Prefer to initialize explicitly. It is expected that this will eventually be enforced in a future version of Nim [1]. - Prefer the object construction syntax. - Prefer the least powerful construct: proc, rather than template. We can rely on the compiler to inline when appropriate. - Remove a redundant `$` operation. [1] https://github.com/nim-lang/Nim/blob/v2.0.0/doc/manual_experimental.md#strict-definitions-and-out-parameters

viega · 2024-02-16T16:45:01Z

Some of the items you bring up here as changes you'd make are things I definitely agreed with at the time. For instance, I never care about performance until it's a proven problem, so I don't mind the code going to Nim... it was in C because I tried multiple different Nim implementations and had either problems getting it to compile at all, or bizarre memory problems, that were fixed when I wrote it in C.

I can talk about why some of the things here are the way they are (e.g., I remember fighting with nim on some of the things I'd consider basic in this file, and I remember a few of the things that seemed like they should have worked, before I got exasperated and dropped to C).

But I don't think that's important (I'm 100% fine w/ it being slower and in Nim), I think the broader style conversation is, and thanks for bringing it up.

From a style perspective, I don't care as much about 'idiomatic Nim' as I do trying to make things as easy to follow as possible for someone coming to Nim from outside. There aren't many Nim programmers anyway, so if our code is something people are looking at, it should be as clear as possible.

And I also think some of the things that are 'idiomatic nim' obfuscate things for the programmer, because they are too intertwined with Nim's implementation details.

Particularly, I think that the var / let world conflates multiple concepts in ways that are not obvious, and pushes work to the developer that I'm not sure is worth asking them to do.

Most people's initial understanding of let is immutable. Granted, this might be a bit old school now, because plenty of languages have gone to the Nim semantics at this point. But:

Partial immutability is unintuitive and hard to explain without a lower-level understanding of things; and
It is not too valuable a tool for programmers to have a promise that the top-level value will not be assigned to again, in the context of a local function (more on cross-function boundaries below).
The requirement to assign in the let statement itself leads to incredibly ugly code that I have lived with, but honestly regret.

For instance, of these two patterns:

var whatever: SomeObj
if y == true:
  whatever = onething
else:
  whatever = somethingelse

And give up the let than to be forced to write

let whatever = if y == true:
                              onething
                        else:
                              somethingelse

The former is much more clear.

In the context of just a single function, the let is not saving the programmer from many bugs, and it should not have any impact on performance... without a function boundary, the analysis is facile.

With cross-function calls, I think there's more of a case for it being valuable to the programmer to ensure immutability. However, that's where Nim's semantics get muddled, and prefer the implementation to the programmer.

First, even if I declare something var locally, it automatically gets 'let' semantics across function boundaries. Second, no matter what I do, there's not actually a promise of immutability, since the language isn't truly passing by value. The only real promise is that the top-level value won't be replaced.

Things get muddled when we move to var parameters, because it's semantics are complicated. This due to Nim having a distinction between 'object' and 'ref object'.

Most languages don't have the object / ref object distinction. In procedural and OO languages, they generally have reference semantics by default, and in functional languages, generally value semantics. And if languages give a way to change the behavior, it tends to be in the parameter passing

But Nim having modifiers like var to indicate pass by reference, plus ref to indicate that the object always passes by reference, makes it impossible to easily understand semantics.

var for parameters on the surface says, "you can re-assign the data object for me", but in Nim, because of the two types of object declaration, it also controls whether you get reference semantics, since object gets you closer to real immutability.

Which would be fine, except, very few new people coming in would ever have a good intuition about when fields of something passed are going to be mutable. What happens when the object is ref but the field has an object that is not? What happens when the field is ref but the object is not?

Even though I am reasonably aware of how Nim's implementation works, I still often have to test to understand what is going to happen.

The fact that immutability and reference passing are intertwined in that notion of var is the core issue, but I've had other problems with this interaction.

I have had more than a few memory errors that clearly are due to interactions between var parameters and ref objects. I've had issues when a ref object is passed to a var parameter (and then passed elsewhere), and with objects that have ref object fields, and, even when passing objects with ref object parameters.

Even though you're asked to use object by default, I have found the only way to get predictable semantics is to almost exclusively use ref object, and only use var to indicate its natural meaning of, "you can (re)assign this for me".

I'm sure the problems mainly stem from when Nim chooses to copy, and when Nim chooses to send a reference, and I understand that's arcane, and why it's arcane.

But that implementation detail should NEVER need to be visible to the programmer in my view. I'm sure it's not intentionally that way, but I don't think it's acceptable to end up putting the mental burden on developers.

To be clear, I was initially reasonably happy w/ the idea of 'ref object' vs 'object', because as a C programmer I definitely like the flexibility of whether to embed a reference or whether to inline the data structure. I didn't previously love the approach of "you always get a pointer". But in Nim it is not worth the tradeoff of unclear semantics for me.

All of this to say, I'm not actually a fan of many of Nim's idiomatic bits:

Many times let by default has led me to compile errors for code that would have worked had I not been forced to decide in advance if it was going to be mutable or not. That's super unnecessary friction, and I don't see much benefit. If the answer is, "it becomes faster Nim", that's more on Nim than me, and I don't generally care about performance.
I think object construction syntax is wonky in its irregularity, made doubly disappointing in not having real constructors.
While I've used openarray some, it is a horrible unnecessary exposing of implementation details, and is incredibly clunky. Not worth it.
It's comfortable and natural for people to use strings. I don't think they should have to worry about the implementation details or conversions, and my primary goal with APIs is to try to avoid making programming any harder than necessary (so hide implementation details like that).
I strongly dislike when it's not clearly communicated in the code whether or not a function is being called.

The last one is a Nim-ism picked up from the functional world that values compactness over clarity (I think it dates all the way back to the first version of ML, if not further), that I have absolutely never been on board with.

In your code, uint8 looks like a data field. I wouldn't find uint8 a[i] any clearer. I think we should almost always use parentheses for clarity.

I can admit there are cases where it's clear enough, and helps declutter, especially when calling echo since plenty of languages have a statement instead of a function for outputting. That's slightly okay and I sometimes do it, but otherwise I'd like to always make function calls explicit if you don't mind.

Otherwise, I don't mind if other people want to use constructs that are idiomatic nim. But I also feel no particular love for what's idiomatic, and have no desire to enforce idioms I don't feel are going to bring a lot of value. If it's causing people difficulty in driving overall chalk functionality and ux forward that's one thing, but otherwise I'd rather invest our effort on those things that really will move the needle for people.

Besides, I'm trying to get out of the critical path on anything code related here, so you'll have wide latitude on style. I'd ask two things though:

Try to put clarity and correctness above all else when it comes to style.
Please try to only go back and clean up for style when you're touching something that's more important to making Chalk more valuable to users.

viega · 2024-02-16T16:49:51Z

BTW, after 30 years of not having any problem w/ Python's white space approach, and not really understanding why it's so polarizing (despite having written tons of C and Python), I have come to be on team braces in the past year.

It's not really Nim's fault, but the comment above illustrates part of the problem... when I typed the let x = if ... the indentation was right when I typed it, but now it renders in a way that wouldn't compile.

I also have had lots of issues where moving code around is more difficult w/o the brace delimiters, and leaves bugs in code past the compile. The braces help address that. Nothing to action, just an observation :)

ee7 requested a review from viega as a code owner February 16, 2024 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prp: refactor #47

prp: refactor #47

ee7 commented Feb 16, 2024 •

edited

Loading

viega commented Feb 16, 2024

viega commented Feb 16, 2024

prp: refactor #47

Are you sure you want to change the base?

prp: refactor #47

Conversation

ee7 commented Feb 16, 2024 • edited Loading

viega commented Feb 16, 2024

viega commented Feb 16, 2024

ee7 commented Feb 16, 2024 •

edited

Loading