You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for creating this! I'm researching induction of behavioral change and was working on a generalized version for a project. I figured I might as well help out on an established repo.
I'm interested in contributing: documenting and to help to refactor and generalize the code, potentially towards a proper package. Since this is a bit of a hybrid and preliminary repo, working towards a more consistent style would also be nice.
Before I start, I'd like to know a bit more about some things:
You mention your personal workflow. Would you provide some more information regarding this? I understand that you are using it to create ablated model weights, but knowing the process will help with documentation, commenting, and restructuring.
What are some of your future plans? I noticed generalization being a big one, which is a goal I share. Since this process can be framed as contrastive search and edit, this should be decently straightforward. Though it will require a few modifications which might disturb your current workflow...
I notice there are more than a few apparently unused methods and some variables that either need defined or need a reference to their instance (self). (See the method calculate_mean_dirs and the variable direction). I understand some methods to be remnants of earlier stages of this process and may or may not be needed depending on your current workflow.
Are there any specific areas that you'd like to focus contributions on first?
Do you have a preference for code or docstring styles?
Looking forward to contributing!
The text was updated successfully, but these errors were encountered:
Oh wow, thanks for all this! I hadn't announced this anywhere so wasn't expecting much attention on it yet.
Probably the best way to express this is for me to soon provide a few toy-scripts/notebooks that show an idea of isolated concepts in using this library.
Don't worry about disturbing my workflow, as the goal of this project is to have a library to minimize the "need" for a workflow and rather just have a nice straightforward process one can run. Generalization is a huge part of the push to a library, from dealing with different models, to potentially dealing with multidimensional features (if such a thing is possible)
You'd be correct that most of that is remnants of past lives of my personal script
4a. Documentation, which is kind of on me.
4b. Finding better techniques/methods that require less human intervention
4c. Improving compatibility with the transformers space
4d. Improving memory usage and optimizing
4e. HF model export process.
Admittedly, I looked up your github after seeing the models on Reddit/Huggingface. I was happy to see this!
Good to know about the workflow, flexibility, and your ambitions. I can think of several uses besides simple refusal ablation, especially with inference-time interventions.
Would you clarify "transformers space" in point (4c)?
Hello,
Thanks for creating this! I'm researching induction of behavioral change and was working on a generalized version for a project. I figured I might as well help out on an established repo.
I'm interested in contributing: documenting and to help to refactor and generalize the code, potentially towards a proper package. Since this is a bit of a hybrid and preliminary repo, working towards a more consistent style would also be nice.
Before I start, I'd like to know a bit more about some things:
You mention your personal workflow. Would you provide some more information regarding this? I understand that you are using it to create ablated model weights, but knowing the process will help with documentation, commenting, and restructuring.
What are some of your future plans? I noticed generalization being a big one, which is a goal I share. Since this process can be framed as contrastive search and edit, this should be decently straightforward. Though it will require a few modifications which might disturb your current workflow...
I notice there are more than a few apparently unused methods and some variables that either need defined or need a reference to their instance (
self
). (See the methodcalculate_mean_dirs
and the variabledirection
). I understand some methods to be remnants of earlier stages of this process and may or may not be needed depending on your current workflow.Are there any specific areas that you'd like to focus contributions on first?
Do you have a preference for code or docstring styles?
Looking forward to contributing!
The text was updated successfully, but these errors were encountered: