Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding facility for separation of the log likelihood and log prior via a Joint struct #75

Open
HarrisonWilde opened this issue Jun 8, 2021 · 7 comments

Comments

@HarrisonWilde
Copy link
Member

Many of the packages that build upon AMCMC define model's with a log density function, in general this is the joint density consisting of the sum of the log prior and likelihood, for the MCMCTempering and some other situations (@yebai made me aware of these I cannot remember them off the top of my head but was assured I might have some luck in advocating for this as I believe it has a reasonably wide use case), I propose we add something like the following to AMCMC:

struct Joint{Tℓprior, Tℓll} <: Function
    ℓprior      :: Tℓprior
    ℓlikelihood :: Tℓll
end

function (joint::Joint)(θ)
    return joint.ℓprior(θ) .+ joint.ℓlikelihood(θ)
end

The purpose is to allow a user to - if they desire - define the log prior and likelihood separately and pass this in to a logdensity model in AMH / as the log density function in AHMC etc. In most cases there is not much motivation to do this as you would simply sum your components in the density function, but having them separable is critical for MCMCTempering at least to work, and in general is a cheap and very simple interface to facilitate operation on the prior / likelihood.

I am open to a bit of discussion on the design of this, as it is a bit of a weird thing to expose to a user potentially, unless they are using tempering or something like that, so I would imagine it would just be documented and flagged up as an option to users rather than being explicitly encouraged, but perhaps other people have an idea for how the same functionality can be achieved in a nicer way in terms of design.

In tempering, we use the user-defined Joint in the model to then apply a temperature to result in the following TemperedJoint, hopefully this illustrates what I mean:

struct TemperedJoint{Tℓprior, Tℓll, T<:AbstractFloat} <: Function
    ℓprior      :: Tℓprior
    ℓlikelihood :: Tℓll
    β           :: T
end

function (tj::TemperedJoint)(θ)
    return tj.ℓprior(θ) .+ (tj.ℓlikelihood(θ) .* tj.β)
end

Currently I need to add both the Joint and TemperedJoint to any tempering implementations that depend on MCMCTempering (aside from Turing which works definitely as the user never defines the logdensity anyway), so if Joint could go in AMCMC, I could add TemperedJoint to MCMCTempering and the requirements for adding tempering to samplers becomes even more trivial.

Woud like to hear thoughts, relatively simple PR if people are happy to add it.

@devmotion
Copy link
Member

I'm not particularly happy about enforcing a specific struct, it seems a bit restrictive. Also it seems difficult (impossible?) to evaluate both the prior and likelihood in a single execution of the model. What about the following design:

  • Use logprior(model, x), logjoint(model, x) (both already defined for DynamicPPL.Model in DynamicPPL) and Distributions.loglikelihood(model, x) (also already used in DynamicPPL) as part of a (not enforced) interface for AbstractModels in AbstractMCMC
  • Define logprior_loglikelihood (also part of the interface) with the fallback
    logprior_loglikelihood(model, x) = logprior(model, x), loglikeihood(model, x)
    for evaluating both the prior and the likelihood. If a model can evaluate both together more efficiently it can implement the method, otherwise it just works.
  • Use it to define
    function logjoint(model, x)
        logprior, loglikelihood = logprior_loglikelihood(model, x)
        return logprior + loglikelihood
    end
    (of course, if there is a more efficient method to get the log join a model should implement it)

This would allow

  • models to only define logprior(model, x) and loglikelihood(model, x) and get the rest for free
  • to exploit possible efficiency gains when evaluating both logprior and loglikelihood
  • you to define something like
    struct TemperedLogjoint{B}
        beta::B
    end
    function (f::TemperedLogjoint)(model, x)
        logprior, loglikelihood = logprior_loglikelihood(model, x)
        return logprior + f.beta * loglikelihood
    end
    (or whatever other design you want to use for the tempered log joint, e.g., with an additional argument beta instead)

@devmotion
Copy link
Member

Just an additional note: of course, this suggestion would still allow you to use a struct for implementing logprior or logprior_loglikelihood etc. for a specific class of models. But the interface would not enforce it.

@HarrisonWilde
Copy link
Member Author

Also it seems difficult (impossible?) to evaluate both the prior and likelihood in a single execution of the model.

By this I presume you mean separately evaluating just the prior / likelihood? This is relatively simple as you can in the case of a DensityModel say do:

model = DensityModel(Joint(lprior, llikelihood))
# then:
model.logdensity.lprior(z), model.logdensity.llikelihood(z)
# or to evaluate the joint
model.logdensity(z)

Of course here we are not quite conforming to the intended usage pattern, as we must access the logdensity as a component of the model rather than the usual logdensity(model, z).

As for your proposal, I think it sounds good, and just to clarify, this conceptual splitting of the joint is only necessary in AMH / AHMC where DynamicPPL.Model is not used; as you have seen in the Turing PR for DynamicPPL model's we use a different approach via contexts. I think you know this though and I am still just getting my head around the idea. It is a tradeoff I suppose as I am stuck in a mindset of trying to make it as simple as possible for someone to support tempering where the only assumption is that they work off AMCMC.

The part I am not following as a result is, what would end-user usage look like with this approach? Presumably they'd define a logprior, a loglikelihood, but then there would still be a step where in AMH say we would need to define a new model constructor that then combines the two, or expect the user to do this themselves? Could you give an example of what this would look like as to me it seems that regardless we end up back to a model where the interior density function is the sum of the two components like you said, but these components cannot be accessed retrospectively as is required for tempering.

@devmotion
Copy link
Member

By this I presume you mean separately evaluating just the prior / likelihood?

No, I mean executing e.g. a Turing model and then accumulating log prior and log likelihood in one run (these quantities are evaluated by executing the model, there's usually no closed form expression or function).

@HarrisonWilde
Copy link
Member Author

Ah I see, yeah it is a funny one because this whole approach is only required for anything external to DynamicPPL, in the model structs present in the packages I mention we only gain functionality through being interact with the log prior and log likelihood post-definition but wouldn't lose anything this way provided the struct is callable, that is really all that is needed here I think.

@devmotion
Copy link
Member

It is a tradeoff I suppose as I am stuck in a mindset of trying to make it as simple as possible for someone to support tempering where the only assumption is that they work off AMCMC.

With the suggestion, users/developers would just have to implement logprior(model, x) and loglikelihood(model, x) for their models. Then tempering would just work (and also things such as logjoint(model, x) and logprior_loglikelihood(model, x)). It is up to the downstream packages how exactly they implement these functions - with a struct where they bundle some functions, by overloading the methods directly etc.

@devmotion
Copy link
Member

Presumably they'd define a logprior, a loglikelihood, but then there would still be a step where in AMH say we would need to define a new model constructor that then combines the two, or expect the user to do this themselves? Could you give an example of what this would look like as to me it seems that regardless we end up back to a model where the interior density function is the sum of the two components like you said, but these components cannot be accessed retrospectively as is required for tempering.

No, there would not necessarily be such a step as the DynamicPPL example shows.

Basically, overloading logprior(model, x) and loglikelihood(model, x) supports both the case

struct Model{F}
  execute::F
end

function logprior(model::Model, x)
    return model.execute(PriorContext(), x)
end
function loglikelihood(model::Model, x)
    return model.execute(LikelihoodContext(), x)
end

(similar to what e.g. DynamicPPL does) and

struct Model{P,L}
    prior::P
    likelihood::L
end

logprior(model::Model, x) = logpdf(model.prior, x)
loglikelihood(model::Model, x) = model.likelihood(x)

(basically what EllipticalSliceSampling does: https://github.com/TuringLang/EllipticalSliceSampling.jl/blob/d54630e397b15efdae9c0ef25af839f62e8f35c2/src/model.jl#L3-L13).

The logprior and loglikelihood functions allow to define your model in whatever way is suitable for the task or sampler. And e.g. in EllipticalSliceSampling we are never interested in the log joint probability and only in the prior and the likelihood, so it seems quite unintuitive to demand to implement a Joint struct. However, with the logprior and loglikelihood functions also this seems not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants