Replies: 17 comments
-
By safely I mean in a way which allows to avoid this potential problem when reading events: In such case we can get IDs of events 0 and 2, but skip reading uncommitted event nr 1. We would like to avoid such problems. From my reading about it and thinking over it, I see 2 potential solutions:
In case of a very long transaction with events, the write throughput would not be affected but a separate queue reading process would not be able to proceed until the transaction is finished. As very long transactions don't happen often it might not be a problem every in an app. |
Beta Was this translation helpful? Give feedback.
-
Imagine starting point like this:
event nr 1 in global stream was inserted:
event nr 2 is trying to be inserted but it takes a lot of time:
event nr 3 was inserted:
The supervisor process responsible for filling gaps would try to execute
It will continue failing until the long transaction is commited or rolled-back. The process/logic responsible for exposing the global event log won't return anything >= 2 when there is a gap. I think that way:
@mlomnicki @mpraglowski what do you think |
Beta Was this translation helpful? Give feedback.
-
The solution presented above show it should be possible in Postgres, haven't check how MySQL would behave in such situation |
Beta Was this translation helpful? Give feedback.
-
The global stream would be quite disconnected from normal streams. Because of its importance it would have its own dedicated table to be able to run the IDs trick. Hmm but maybe that's not even necessary. It was easier to consider it being implemented that way to me and write such a proof of concept. |
Beta Was this translation helpful? Give feedback.
-
@paneq I need to read it more carefully but at the first sight this solution seems best to me
I'd go with this solution. Whether it would be an optimistic or a pessimistic lock doesn't matter but if we want to avoid race conditions we have to lock the stream. Obviously an optimistic lock would be preferred. Anyway your solution looks interesting and I'd be happy to give it a try and a closer look. Maybe it would be best to implement it as an alternative rails_event_store_active_record? By default RES would offer a simple persistence model but everyone would be free to switch to a faster solution simply by using another repository. I imagine that ideally we'd like to be both free of race conditions and super fast but in my humble opinion we must pick only one. It'll be either consistent or fast*, not both. As I already mentioned in another thread, if we want fast writes then maybe we can introduce a special mode or a special type of a stream which wouldn't give any guarantees on the consistency but would offer better throughput. We could offer 2 strategies
then it'd be up to the developer to pick whatever works best to them in the given context *[obviously"fast" is very relative and subjective...] |
Beta Was this translation helpful? Give feedback.
-
Situation:
Let's say we iterate over events ascending starting from 0, we see a gap in position Hypothesisr = ActiveRecord::Base.connection.execute "SELECT id, xmin, xmax from event_store_events"
r.each{|x| puts x}
# {"id"=>"1", "xmin"=>"22085", "xmax"=>"0"}
# {"id"=>"3", "xmin"=>"22088", "xmax"=>"0"}
xminc = "SELECT * FROM txid_snapshot_xmin(txid_current_snapshot());"
r = ActiveRecord::Base.connection.execute(xminc)
r.each{|x| puts x}
# => {"txid_snapshot_xmin"=>22086} We can only read up to xmin < Now I interrupt the ongoing transaction from console 2. xminc = "SELECT * FROM txid_snapshot_xmin(txid_current_snapshot());"
r = ActiveRecord::Base.connection.execute(xminc)
r.each{|x| puts x}
# {"txid_snapshot_xmin"=>22089}
r = ActiveRecord::Base.connection.execute "SELECT id, xmin, xmax from event_store_events"
r.each{|x| puts x}
# {"id"=>"1", "xmin"=>"22085", "xmax"=>"0"}
# {"id"=>"3", "xmin"=>"22088", "xmax"=>"0"} Now we know we there will be gap at Based on:
Perhaps I am mistaken but it seems to me we could just do: SELECT *
FROM event_store_events_in_streams
ORDER BY id ASC
WHERE stream= RubyEventStore::GLOBAL_STREAM
AND
id > last_seen_id
AND
xmin < txid_snapshot_xmin(txid_current_snapshot()) if that's possible. If P.S. I tested on Dragons
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
so this part would need to be implemented in Ruby. Fortuntely that's easy. |
Beta Was this translation helpful? Give feedback.
-
Sounds good to me. One detail: this is all under the assumption that we won't update events rows, right? I am asking because following statement in
So I imagine, that if I would start some longer process which is updating old event (let's say, 1 year old event), then |
Beta Was this translation helpful? Give feedback.
-
Other challenges found. Research is continued. |
Beta Was this translation helpful? Give feedback.
-
TLDR in Polish of all my research: https://youtu.be/xJpEOCiyJxw |
Beta Was this translation helpful? Give feedback.
-
As far as I understand the only ways to get the list of records in the order of committed (without doing any delete or update operations per every synchronizing client) is:
You can linearize your writes to achieve the properties that we needed - #403 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
reproducible test: https://github.com/mostlyobvious/postgresql-application-log-implementations/blob/3dc569f6812b2bef27e69970cdcc75149852c593/log_test.rb |
Beta Was this translation helpful? Give feedback.
-
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/ also in favour of locking table for writes in the reader |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Implement Event Store in a way that would allow clients to safely iterate over global stream of all events with pagination and no race conditions. The client would remember its position in the stream of events in its own side.
Beta Was this translation helpful? Give feedback.
All reactions