-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several bugs around the 'byteString' family of Builders #671
Fix several bugs around the 'byteString' family of Builders #671
Conversation
-- Ensure that the common case is not recursive and therefore yields | ||
-- better code. | ||
| op' <= ope = do copyBytes op ip isize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op'
might overflow
op' = op `plusPtr` isize | ||
ip = unsafeForeignPtrToPtr ifp | ||
ipe = ip `plusPtr` isize | ||
k br = do touchForeignPtr ifp -- input consumed: OK to release here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe to put the touchForeignPtr
in the continuation like this because the call site may consume one copied chunk and then hang without ever invoking that continuation.
@@ -850,19 +850,17 @@ byteStringCopy = \bs -> builder $ byteStringCopyStep bs | |||
|
|||
{-# INLINE byteStringCopyStep #-} | |||
byteStringCopyStep :: S.StrictByteString -> BuildStep a -> BuildStep a | |||
byteStringCopyStep (S.BS ifp isize) !k0 br0@(BufferRange op ope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builders are not allowed to do anything with their continuation arguments until they are done writing, including forcing them. (This one now has some test cases.)
Data/ByteString/Builder/Internal.hs
Outdated
touchForeignPtr ifp | ||
k0 (BufferRange op' ope) | ||
| otherwise = wrappedBytesCopyStep (BufferRange ip ipe) k br0 | ||
-- What's the reasoning here, more concretely? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to elaborate. I thought about it for a while but still have no idea in what sense a non-recursive hot path was supposed to be "better" at the time this was written.
- For small statically-known lengths, nowadays, we might benefit from specialized
copyBytes
code generation.- But
copyBytes
just lowered to an opaque foreign call when this comment was written. - This special code generation is not great until ghc#24519 is fixed.
- Anyway, it'd be nice to somehow incur this extra branch only for literals if this is our reasoning.
- But
- Recursive stuff is much less likely to inline automatically.
- The fact that we need inlining of
b1
to happen to make aBuilder
concatenationb1 <> b2
fast is pretty ugly.- I think on my machine the reboxing of args and the unknown call incur around 15ns of overhead per builder. There can also be a bit more overhead if the continuation has to allocate a PAP.
- At the time the
Builder
core stuff was written, it was found that unboxing theBuilder
args actually made things slower, for obscure RTS fast-call-pattern reasons. (I think it was about equal when I tried it a year or two ago.)
- If this is really that important, inlining of the whole loop could be forced with an inner-
go
and anINLINE
pragma. And that entire loop isn't bigger than this one unrolled iteration!
- The fact that we need inlining of
I suppose in light of the lack of clear reasoning I'm fine with deleting this comment and unifying byteStringCopyStep
with wrappedBytesCopyStep
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also have to look at benchmarks, but we don't have any that really test this stuff right now. That changes in #569, but actually a change I wanted to make in that PR is blocked on this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out I was wrong. There is a relevant benchmark already, foldMap byteStringCopy
, and getting rid of the byteStringCopyStep
/wrappedBytesCopyStep
distinction slows that benchmark down by about 30% for some very obscure reasons. I'll push a comment describing these reasons in a few minutes.
This whole business with the continuation-threading causing practically unpredictable performance issues with Builder
s has gotten very tiresome. Perhaps I will revisit #194 relatively soon.
This makes explicit the reasoning for in what sense "ensur[ing] that the common case is not recursive" is expected to possibly "yield[] better code."
* Fix several bugs around the 'byteString' family of Builders * Add Note [byteStringCopyStep and wrappedBytesCopyStep] This makes explicit the reasoning for in what sense "ensur[ing] that the common case is not recursive" is expected to possibly "yield[] better code." # Conflicts: # tests/Properties/ByteString.hs
* Fix several bugs around the 'byteString' family of Builders * Add Note [byteStringCopyStep and wrappedBytesCopyStep] This makes explicit the reasoning for in what sense "ensur[ing] that the common case is not recursive" is expected to possibly "yield[] better code."
No description provided.