You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It has been repeatedly requested that fixed-time-window-fn both (a) automatically dump its buffer at the end of each time window (#563#259#728) and (b) remain the way it is (#728 (comment)#728 (comment)). It has also been noted that the code as-is will drop events from an older time window that arrive after events from a newer one #728 (comment).
An idea I had is to add a (maybe optional) argument to fixed-time-window-fn (let's call it pippo for now because I need an excuse to use this var name) which specifies how long to wait after a time period ends before flushing a given time window's buffer. So a given fixed-time-window-fn-based stream will maintain multiple buffers for different time windows when pippo > 0. Each of these buffers will only be flushed pippo seconds after the the time window they are for has ended even if events from later time windows have already been received.
This is advantageous in two ways:
In configurations where upstream streams may delay events fed to fixed-time-window-fn, pippo can be set to slightly more than the maximum delay time for an event to make it to the fixed-time-window-fn stream. This allows users like @Anvil to take advantage of automatic buffer flushing without worrying about missing events.
Even if a fixed-time-window-fn-based stream is not being fed by parent streams that induce delays, it fixes the problem of missing events that may arrive slightly out of order. I'd imagine that setting pippo to a small value like 10 should be sufficient to make sure that the majority of events that arrive slightly out of order are still included in the vector of events for the appropriate time window, instead of being dropped.
The only concern I have regarding such an implementation would be the memory usage when pippo is set very high. It seems like one shouldn't need to set it to more than a few times the value that the time window is set to, so it shouldn't use more than a few times the memory that it does at present. I think if the functionality is documented well this shouldn't be a problem.
The text was updated successfully, but these errors were encountered:
It has been repeatedly requested that
fixed-time-window-fn
both (a) automatically dump its buffer at the end of each time window (#563 #259 #728) and (b) remain the way it is (#728 (comment) #728 (comment)). It has also been noted that the code as-is will drop events from an older time window that arrive after events from a newer one #728 (comment).An idea I had is to add a (maybe optional) argument to
fixed-time-window-fn
(let's call itpippo
for now because I need an excuse to use this var name) which specifies how long to wait after a time period ends before flushing a given time window's buffer. So a givenfixed-time-window-fn
-based stream will maintain multiple buffers for different time windows whenpippo > 0
. Each of these buffers will only be flushedpippo
seconds after the the time window they are for has ended even if events from later time windows have already been received.This is advantageous in two ways:
fixed-time-window-fn
,pippo
can be set to slightly more than the maximum delay time for an event to make it to thefixed-time-window-fn
stream. This allows users like @Anvil to take advantage of automatic buffer flushing without worrying about missing events.fixed-time-window-fn
-based stream is not being fed by parent streams that induce delays, it fixes the problem of missing events that may arrive slightly out of order. I'd imagine that settingpippo
to a small value like10
should be sufficient to make sure that the majority of events that arrive slightly out of order are still included in the vector of events for the appropriate time window, instead of being dropped.The only concern I have regarding such an implementation would be the memory usage when
pippo
is set very high. It seems like one shouldn't need to set it to more than a few times the value that the time window is set to, so it shouldn't use more than a few times the memory that it does at present. I think if the functionality is documented well this shouldn't be a problem.The text was updated successfully, but these errors were encountered: