Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spineplots and spinograms for factor y-variables #233

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

zeileis
Copy link
Collaborator

@zeileis zeileis commented Oct 10, 2024

Fixes #2

This PR will eventually support spineplots (factor ~ factor) and spinograms (factor ~ numeric) using the type_ infrastructure from the epic #222

There is already some good progress. For example, you can do:

library("tinyplot")
 aq = transform(
   airquality,
   Month = factor(Month, labels = month.abb[unique(Month)]),
   Hot = Temp > median(Temp)
 )
 tinyplot(Hot ~ Wind, facet = ~ Month, data = aq, type = type_spineplot(), breaks = 4)

spineplot airquality

ttnc <- as.data.frame(Titanic)
tinyplot(Survived ~ Sex, facet = ~ Class, data = ttnc, weights = ttnc$Freq, type = type_spineplot())

spineplot Titanic

But as you can see the axis labeling is not great and the by handling does not work properly, yet. I will ask various questions, mostly to you Vincent @vincentarelbundock, I'm afraid. But I'm optimistic that we can sort out the details.

tbc

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 10, 2024

First questions:

@vincentarelbundock
Copy link
Collaborator

* The `xaxs = "i"` and `yaxis = "i"` are still not passed to the right place. Do I need to add these as explicit arguments somewhere else?

What functions consume these arguments? With your current code, the variables themselves should already be available in the tinyplot() scope, but I don't see any call anywhere that uses these variables a inputs. Should they be passed to the facet drawing function, the window creating, or to draw_spineplot()? I think things are setup correctly in your type_spineplot() code. It's probably just a matter of carrying through as inputs to the proper functions.

* Some arguments that I set in the `data_` function I want to access in the `draw_` function. What is the best way to do that? Currently I include these in the return value (e.g., https://github.com/grantmcdermott/tinyplot/blob/spineplot/R/type_spineplot.R#L156) and then fetch them from the `parent.frame()` (https://github.com/grantmcdermott/tinyplot/blob/spineplot/R/type_spineplot.R#L21-L22). But that's probably not a good solution...

What "shape" do these arguments have? One simple option might be to use data_spineplot() to insert this info as new columns with idiosyncratic names into datapoints. Then, draw_spineplot() has access to that data and can retrieve the info directly.

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 11, 2024

The xaxs and yaxs properties have to be set when creating the outer plot to which the types are then adding the actual content. So I think it needs to be passed to draw_facet_window() and then ultimately plot.window(). Is there some way to achieve this? Or do I need to add arguments xaxs and yaxs explicitly for tinyplot() and then pass them through everything?

Regarding the extra arguments: The most prominent case are the breaks for the spinograms. These are the points at which I split the numeric x variable into categories. I want to compute them on the entire data only once, that's why I put them in data_spineplot().

As they are neither a scalar, nor of length "n", I did not put them as a column in datapoints. But I could put them there if I pad with NAs. So it's technically possible but also not elegant.

You could argue that I ought to cut() the x into categories in the data_spineplot() function already. And that's probably a good idea. But I would still need to pass along the underlying breaks because I need these for making nice axis labels.

@vincentarelbundock
Copy link
Collaborator

The xaxs and yaxs properties have to be set when creating the outer plot to which the types are then adding the actual content. So I think it needs to be passed to draw_facet_window() and then ultimately plot.window(). Is there some way to achieve this? Or do I need to add arguments xaxs and yaxs explicitly for tinyplot() and then pass them through everything?

If we think users will want to specify xaxs explicitly themselves in other plots, then we could add them to the main function. But if you think it's mostly an internal thing, then you could add:

xaxs = yaxs = NULL

just before data_type() is called. Then, your function overrides the default NULL value. draw_facet_window() can then be modified to accept the internal xaxs value, ignore it if is NULL, or act correctly if it is non-null.

As they are neither a scalar, nor of length "n", I did not put them as a column in datapoints. But I could put them there if I pad with NAs. So it's technically possible but also not elegant.

Here's one idea: In the main tinyplot function, just before calling type_data(), create an empty list called type_info. Then, data_spineplot() overrides that empty list with a named list of whatever you need in the drawing function. Finally, we modify the main tinyplot() function to pass type_info to type_draw().

Since every type_draw() function accepts ..., type_info will be ignored most of the time. But then custom types like yours have an easy mechanism to pass arbitrary data from type_data() to type_draw().

…ass them on to draw_facet_window() where they are set via par()
@zeileis
Copy link
Collaborator Author

zeileis commented Oct 11, 2024

Great, Vincent @vincentarelbundock, very useful. I've added the xaxs and yaxs arguments now - also to tinyplot() as they are standard par() arguments. Grant @grantmcdermott, let me know if you disagree and would not have exported them.

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 11, 2024

And nice idea with the type_info, I've also added that now! 💡

I didn't do extensive tests, yet, but I think that tinyplot() and plot() now give the same output for factor ~ factor and factor ~ numeric! 🎉

Next I want to polish the faceted displays and then have a stab at handling by variables. For the facets I have two questions:

  1. Is there a recommended way how to increase the margins between the displays? Because spine plots employ both the left and the right y-axis for labels, we need a little bit more space here.
  2. Because for spineplots type_draw is drawing the axes rather than draw_facet_window: Can type_draw know whether it is in facet on the very left or very right and at the top or at the bottom, respectively? Then, we could draw fewer axes, if we want.

@vincentarelbundock
Copy link
Collaborator

I didn't do extensive tests, yet, but I think that tinyplot() and plot() now give the same output for factor ~ factor and factor ~ numeric! 🎉

Very, very cool!

1. Is there a recommended way how to increase the margins between the displays? Because spine plots employ both the left and the right y-axis for labels, we need a little bit more space here.

I don't know about margins. Paging @grantmcdermott

2. Because for spineplots `type_draw` is drawing the axes rather than `draw_facet_window`: Can `type_draw` know whether it is in facet on the very left or very right and at the top or at the bottom, respectively? Then, we could draw fewer axes, if we want.

I see a facet_window_args object in the main tinyplot function with a bunch of information in it. I bet if you pass this to type_draw(), you could match it to ifacet which gives you the index of the current facet.

@grantmcdermott
Copy link
Owner

grantmcdermott commented Oct 11, 2024

Very exciting 🚀

  1. Is there a recommended way how to increase the margins between the displays? Because spine plots employ both the left and the right y-axis for labels, we need a little bit more space here.

Yes. That's the fmar parameter, which can be accessed/set either: 1) temporarily as part of the list passed to tinyplot(...., facet.args = list(fmar = xx)), or 2) permanently via tpar(fmar).

Reading and typing quickly on my phone, so I hope I didn't misunderstand the question. I'll be able to look properly in an hour or so.

Edit: Details and default values here. https://grantmcdermott.com/tinyplot/man/tpar.html#additional-graphical-parameters

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 11, 2024

Thanks, Grant. Then I see two ways of setting this:

  1. We include facet.args in the fargs list for type_data so that it can be modified and subsequently passed on to draw_facet_window.
  2. We just call tpar(fmar = ...) within the type_data function.

2 is leaner but I guess 1 is cleaner?

@vincentarelbundock
Copy link
Collaborator

Yeah, I don't see a good reason to keep away too many things from type_draw(). Seems like a general design.

@grantmcdermott
Copy link
Owner

grantmcdermott commented Oct 11, 2024

RE: facet margin adjustments. Another option would be to check for type=='spineplot' and then do an automatic adjustment similar to what we do for other special cases here:

tinyplot/R/facet.R

Lines 127 to 152 in 068b431

# Set facet margins (i.e., gaps between facets)
if (is.null(facet.args[["fmar"]])) {
fmar = tpar("fmar")
} else {
if (length(facet.args[["fmar"]]) != 4) {
warning(
"`fmar` has to be a vector of length four, e.g.",
"`facet.args = list(fmar = c(b,l,t,r))`.",
"\n",
"Resetting to fmar = c(1,1,1,1) default.",
"\n"
)
fmar = tpar("fmar")
} else {
fmar = facet.args[["fmar"]]
}
}
# We need to adjust for n>=3 facet cases for correct spacing...
if (nfacets >= 3) {
## ... exception for 2x2 cases
if (!(nfacet_rows == 2 && nfacet_cols == 2)) fmar = fmar * .75
}
# Extra reduction if no plot frame to reduce whitespace
if (isFALSE(frame.plot)) {
fmar = fmar - 0.5
}

E.g. In the last bit of the above code chunk, we subtract 0.5 lines from the fmar values if the plot frame is turned off (to reduce unnecessary whitespace between the individual facets).

Summarising: maybe we just try adding the following below line 152?

if (type == "spineplot") fmar = fmar + 1    # or however many lines you want to increase by

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 12, 2024

Thanks for the advice, as usual very helpful. I now did the following:

  • Avoid any type == "spineplot" to keep type processing as modular as possible.
  • Pass facet.args to type_data so that type_spineplot can increase the default fmar.
  • Pass facet_window_args to type_draw so that the axis(4) is only drawn in the right-most panel in each row.
  • For this I added a new helper function is_facet_position (in facet.R) which can determine whether the current facet panel is on the "left" or the "right" and at the "top" or the "bottom" of the facet grid. Maybe we want to leverage this in other types as well?
  • I added an interpretation of xaxt/yaxt to type_spineplot although I had to adjust their meaning a little bit because the axes are non-standard.

@grantmcdermott
Copy link
Owner

For this I added a new helper function is_facet_position (in facet.R) which can determine whether the current facet panel is on the "left" or the "right" and at the "top" or the "bottom" of the facet grid. Maybe we want to leverage this in other types as well?

Thanks @zeileis, I'll take a look. Would this supplant (duplicate?) the existing logic that we use here for only drawing axes of "outer" facets if the plot frame is turned off?

tinyplot/R/facet.R

Lines 38 to 39 in 068b431

oxaxis = tail(ifacet, nfacet_cols)
oyaxis = seq(1, nfacets, by = nfacet_cols)

and
https://github.com/grantmcdermott/tinyplot/blob/main/R/facet.R#L250-L265

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 12, 2024

Thanks, Grant, I overlooked that feature. Why is that logic only applied if there is no frame.plot? Shouldn't this be disentangled? This would be helpful in general I guess. But for spineplots in particular because I don't have a standard plotting region so that frame.plot has to be treated differently/

For spineplots, at the moment, I always repeat axis 1 and 3 but axis 4 is only shown for the last panel in a row. But I'm happy to adapt

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 13, 2024

Summary

The type_spineplot() is pretty decent now. The main missing feature is by which I will try to tackle next. Some fine details of margins and legends in the case of facets (see above) can still be improved but are no show-stoppers, I think.

My latest additions are:

  • flip is supported now (by actually flipping the split direction and not just swapping the variables)
  • more granular control of axes/xaxt/yaxt so that frames around the rectangles can be switched off
  • facet colors are now supported in the usual way - by deriving simple sequential HCL-based palettes within each facet

Examples

library("tinyplot")
ttnc = as.data.frame(Titanic)
tinyplot(Survived ~ Sex | Class, facet = "by", data = ttnc, weights = ttnc$Freq, type = type_spineplot(),
  palette = "Dark 2", facet.args = list(nrow = 1), axes = "t", lwd = 5)

spineplot1

tinyplot(Survived ~ Class | Sex, facet = "by", data = ttnc, weights = ttnc$Freq, type = type_spineplot(),
  palette = "Dark 2", facet.args = list(ncol = 1), axes = "t", lwd = 5, flip = TRUE)

spineplot2

Problems

Legend symbols: Note that I have to set lwd = 5 in order to produce thick lines in the legend. It would be better to create filled rectangles there. Can I modify the type_draw function to achieve this?

Colors: Often one would want to select a single set of colors coding the levels of the y-variable like this:

spineplot3

Users transitioning from plot() would expect the following code to work but it leads to an error:

p = palette.colors(3, "Pastel 1")
tinyplot(Species ~ Sepal.Width, data = iris, breaks = 4, type = type_spineplot(), col = p)
## Error: `col` must be of length 1 or 1.

For now, I have worked around this by giving the type_spineplot() function another col argument but this isn't ideal. Maybe we would have to special case this once type_spineplot() is an official plot type?

tinyplot(Species ~ Sepal.Width, data = iris, breaks = 4, type = type_spineplot(col = p))

@grantmcdermott
Copy link
Owner

Whoa, these look great. I'll try to do a proper review tomorrow. (@vincentarelbundock please feel free to jump in first if you have time.) Really excited to see this long-standing issue nearing a resolution!

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 13, 2024

Honestly, I wasn't sure whether we would really get here because there were so many special cases in the old monolithic code 🙈

Also I expected the modularization to be even more complicated. But Vincent's trick of passing a lot of arguments to the workers which can then overwrite them via listenv() is really net. I hadn't seen this before. 💡

@vincentarelbundock
Copy link
Collaborator

I hadn't seen this before. 💡

Me neither 😭

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 13, 2024

How did you come across this idea? Was this Grant's input? (I didn't follow the details about the initial discussion of the modularization.)

Bonus question: The standard design for type_draw is to cycle through ever facet level within each by level. For spineplots I would need to draw all by levels simultaneously within a given facet level. In this case I would only draw anything if iby == 1L but I would need to get the relevant subset of the data. I guess I could piece it together from data_by but I wondered whether you see a more elegant approach?

@vincentarelbundock
Copy link
Collaborator

vincentarelbundock commented Oct 13, 2024

No, I just started by returning a bunch of arguments in a list and reassigning them. Then, I got lazy and used list2env() as a hack. Only later I realized it was kind of a neat trick.

Don't have a great idea now, and I'm not going to be able to concentrate on this for a few days at least since it's holiday here. Sorry!

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 13, 2024

That's perfectly fine! I should really be doing other things as well (not vacations unfortunately). So I'll wait for Grant's feedback first and then return to handling the by variable later. Enjoy the vacations.

@grantmcdermott
Copy link
Owner

Sorry, I haven't had time to review this properly. I also have to head out of town now... but I just pushed a simple tweak (workaround) that gives square legend symbols.

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

ttnc = as.data.frame(Titanic)

tinyplot(Survived ~ Sex | Class, facet = "by", data = ttnc, weights = ttnc$Freq, type = type_spineplot(),
         palette = "Dark 2", facet.args = list(nrow = 1), axes = "t")

It's a bit hacky and I'll also flag that the bespoke coloring override here means that we don't match the offset black correctly. For example, see the legend key for "1st" is darker than the plot region here.

tinyplot(Survived ~ Sex | Class, facet = "by", data = ttnc, weights = ttnc$Freq, type = type_spineplot(),
         facet.args = list(nrow = 1), axes = "t")

More generally, I need to think about the best way to pass arguments like col, lty, lwd etc. back and forth between the type_spineplot() constructors and the other drawing arguments. (I know that this is tricky b/c of spineplot's unique axes and colouring requirements, which are currently done "just in time" as part of draw_spineplot().

@zeileis
Copy link
Collaborator Author

zeileis commented Oct 17, 2024

Grant, thanks for this! Some comments and thoughts below. Nothing urgent, so feel free not to respond any time soon...

  • I did the special casing of the default black (#000000FF) because (a) to match the default colors of plot()/spineplot() in base R and (b) because starting from black makes the sequential palette rather dark.
  • Currently the color computations are indeed done just in time in type_draw but I could easily move them to type_data.
  • However, as far as I can tell this would still not enable me to pass the right arguments to draw_legend, or am I overlooking something?
  • In general, I think it would be good give the type constructors some control over the legend as well. Either via type_data or possibly via an additional type_legend or so, if necessary.
  • It would also be desirable to get rid of most of the special cases for certain types, especially in draw_legend. For the basic type = "p", "l", "o", and friends such special cases are ok IMO but everything else would ideally be in the type functions.

@grantmcdermott
Copy link
Owner

However, as far as I can tell this would still not enable me to pass the right arguments to draw_legend, or am I overlooking something?

I don't think you're missing something. Or, at least, I didn't see how to do it either. Speaking of which...

In general, I think it would be good give the type constructors some control over the legend as well. Either via type_data or possibly via an additional type_legend or so, if necessary.

Yeah, this is a great shout-out. I don't know if we (collectively) have the time to fix this before the next release... which I was hoping to push through within the next week or two, once this gets merged. But doing so would greatly simplify / negate some of the legacy workaround that we've carried through from pre-modularization. (Basically, just +1 to your final bullet point.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wishlist: Support factors as x or y
3 participants