Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chop datafunc needs chop interval #112

Open
jeffwu78 opened this issue Nov 22, 2021 · 3 comments
Open

Chop datafunc needs chop interval #112

jeffwu78 opened this issue Nov 22, 2021 · 3 comments

Comments

@jeffwu78
Copy link

The chop datafunc has access to the old interval and whether it is a lower or upper chop, but it does not have access to the requested chop interval. This seems like a major oversight. Am I missing something?

To clarify, in the example given the documentation

def datafunc(iv, islower):
... oldlimit = iv[islower]
... return "oldlimit: {0}, islower: {1}".format(oldlimit, islower)
t = IntervalTree([Interval(0, 10)])
t.chop(3, 7, datafunc)

datafunc is given access to the old interval (0, 10), but not the actual chop interval (3, 7).

@chaimleib
Copy link
Owner

chaimleib commented Nov 23, 2021

The datafunc exists to return what the data field of the resulting interval(s) should be. The result interval has not been finalized yet at the time the datafunc is called, and so it is not provided. However, its bounds can be determined by combining the arguments given with knowledge of the boundary values provided to chop (via lexical scoping).

iv as provided in your quoted snippet is one of the original intervals that are getting sliced, in case this is needed to build the return value.

islower is a boolean that says whether the resulting interval is less than lower bound provided to chop. If so, it is True; otherwise, islower is False and the resulting interval is greater than the upper bound provided to chop. In the snippet, I used it to help select the boundary value of iv that will be obliterated in the chop in oldlimit = iv[islower].

Note that if the chop boundaries both lie within an interval to be chopped, datafunc will be called twice with the same iv: once with islower = True and once with islower = False.

@jeffwu78
Copy link
Author

The interval data field in my application is a time series, and in my datafunc function, I want to cut out the part of the time series not belonging to chop interval. The problem is the datafunc function does not have access to the endpoints of the chop interval needed to do this.

Although now that I think of it, there is a work-around to the problem. I have an "extended" datafunc function also takes in the chop interval endpoints, and then use a lambda function to pass in these endpoints and create the desired datafunc for the chop function.

@johentsch
Copy link

johentsch commented Oct 23, 2023

Found this when I had the same question for .slice(), so I'm spelling out the above solutions by @chaimleib and @jeffwu78 which work the same way for .chop() by integrating the second parameter.

Problem

def datafunc(sliced_interval, islower):
    a, b, _  = sliced_interval
    if islower:
        return f"new data = remove ['point', {b}) from [{a}, {b})"
    return f"new data = remove [{a}, 'point') from [{a}, {b})"

t = IntervalTree([Interval(0, 10), Interval(5, 15)])
t.slice(6, datafunc)
Result where datafunc does not have access to slice point 6

[Interval(0, 6, "new data = remove ['point', 10) from [0, 10)"),
 Interval(5, 6, "new data = remove ['point', 15) from [5, 15)"),
 Interval(6, 10, "new data = remove [0, 'point') from [0, 10)"),
 Interval(6, 15, "new data = remove [5, 'point') from [5, 15)")]

Solution inspired by @chaimleib

def slice_the_tree(t, point):
    
    def datafunc(sliced_interval, islower):
        a, b, _  = sliced_interval
        if islower:
            return f"new data = remove [{point}, {b}) from [{a}, {b})"
        return f"new data = remove [{a}, {point}) from [{a}, {b})"

    t.slice(point, datafunc)

t = IntervalTree([Interval(0, 10), Interval(5, 15)])
slice_the_tree(t, 6)
Result where datafunc has access to slice point 6

[Interval(0, 6, 'new data = remove [6, 10) from [0, 10)'),
 Interval(5, 6, 'new data = remove [6, 15) from [5, 15)'),
 Interval(6, 10, 'new data = remove [0, 6) from [0, 10)'),
 Interval(6, 15, 'new data = remove [5, 6) from [5, 15)')]

Solution inspired by @jeffwu78

def extended_datafunc(sliced_interval, islower, point):
    a, b, _  = sliced_interval
    if islower:
        return f"new data = remove [{point}, {b}) from [{a}, {b})"
    return f"new data = remove [{a}, {point}) from [{a}, {b})"

t = IntervalTree([Interval(0, 10), Interval(5, 15)])
t.slice(6, lambda sliced_interval, islower: extended_datafunc(sliced_interval, islower, 6))
Result as above

[Interval(0, 6, 'new data = remove [6, 10) from [0, 10)'),
 Interval(5, 6, 'new data = remove [6, 15) from [5, 15)'),
 Interval(6, 10, 'new data = remove [0, 6) from [0, 10)'),
 Interval(6, 15, 'new data = remove [5, 6) from [5, 15)')]

More legible adaptation

This solution uses the extended_datafunc() from the previous solution but uses partial initialization instead of the lambda:

from functools import partial
t = IntervalTree([Interval(0, 10), Interval(5, 15)])
t.slice(6, partial(extended_datafunc, point=6))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants