Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ensembling methods for tiling to Anomalib #1226

Merged
merged 242 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
242 commits
Select commit Hold shift + click to select a range
73fa062
Fixed broken links in readme
blaz-r Mar 18, 2023
eeb0b90
Fixed inference command in readme
blaz-r Mar 18, 2023
4c60ab7
Merge branch 'openvinotoolkit:main' into main
blaz-r Mar 23, 2023
b38ea6d
Merge branch 'openvinotoolkit:main' into main
blaz-r Mar 28, 2023
7ea1047
Merge branch 'openvinotoolkit:main' into main
blaz-r Apr 19, 2023
24d32f8
Merge branch 'openvinotoolkit:main' into main
blaz-r Jun 22, 2023
971cd7f
Add tiling for ensemble
blaz-r Jun 22, 2023
53d110b
Add tests for tiling for ensemble
blaz-r Jun 24, 2023
3237379
Moved ensemble tiler to separate file
blaz-r Jun 25, 2023
621e1b4
Modify padim config for ensemble
blaz-r Jun 26, 2023
2c7785c
Add tiling to dataset
blaz-r Jun 26, 2023
b934db3
Revert changes to train
blaz-r Jun 28, 2023
71dabaf
Add tiling to collate fn
blaz-r Jun 28, 2023
ef69183
Fix tiling in collate
blaz-r Jun 29, 2023
6c1357c
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Jun 30, 2023
0845114
Change val. function to protected
blaz-r Jun 30, 2023
7a0ecfa
Add tile number logic
blaz-r Jun 30, 2023
8437875
Move collate fn to separate file
blaz-r Jun 30, 2023
c8ecb6e
Update tests for tiler
blaz-r Jun 30, 2023
86c62c7
Add training loop for ensemble
blaz-r Jun 30, 2023
f9bb615
Add model input size setup
blaz-r Jul 3, 2023
3e2dbda
Move ens config to separate file
blaz-r Jul 3, 2023
28ea8a2
Revert mvtec modifications
blaz-r Jul 3, 2023
c321e60
Remove unused imports in mvtec
blaz-r Jul 3, 2023
cd90ef3
Add batch adjustment to untiling
blaz-r Jul 4, 2023
941439e
Add predict step to ensemble
blaz-r Jul 4, 2023
42023c6
Add comment and docstring to tile joining function
blaz-r Jul 4, 2023
69bd0e8
Move tile joining to separate function
blaz-r Jul 5, 2023
06eb042
Add joining for all tiled data
blaz-r Jul 5, 2023
67b9c3a
Add joining for all box data
blaz-r Jul 5, 2023
94bc485
Refactor pred. joining as modular class
blaz-r Jul 6, 2023
17b5655
Fix box joining
blaz-r Jul 6, 2023
fc22440
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Jul 10, 2023
d1be665
Add label and score joining
blaz-r Jul 10, 2023
752a749
Merge remote-tracking branch 'origin/ensemble' into ensemble
blaz-r Jul 10, 2023
0a7b526
Add ensemble visualization
blaz-r Jul 10, 2023
d091659
Add end of predict hook
blaz-r Jul 11, 2023
19d036e
Add metric computation
blaz-r Jul 11, 2023
a607f73
Fix metric thresholds
blaz-r Jul 12, 2023
62085de
Add removal of individual visualization
blaz-r Jul 12, 2023
c81a612
Add demo1 notebook
blaz-r Jul 14, 2023
faf2839
Add docstrings and cleanup
blaz-r Jul 17, 2023
a359b02
Add memory benchmark
blaz-r Jul 19, 2023
326b304
Add modular class for storing predictions
blaz-r Jul 20, 2023
6f2e789
Add metric to separate class
blaz-r Jul 20, 2023
184bdb2
Refactor to support prediction data class
blaz-r Jul 20, 2023
f872303
Rename predictions class
blaz-r Jul 20, 2023
d7ab2a5
Add filesystem predictions class
blaz-r Jul 20, 2023
1cc9c65
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Jul 20, 2023
ac8b657
Add resized predictions class
blaz-r Jul 21, 2023
d4b3611
Fix joiner for classification task
blaz-r Jul 22, 2023
894d9ea
Add page peak to memory benchmark
blaz-r Jul 22, 2023
95812e4
Add global stats calculation
blaz-r Jul 25, 2023
5842623
Add docstrings to stats calculation
blaz-r Jul 25, 2023
447613b
Refactor joiner for pipeline
blaz-r Jul 26, 2023
3a04dc4
Refactor stats into pipeline
blaz-r Jul 26, 2023
548bf20
Refactor metrics as pipeline block
blaz-r Jul 26, 2023
f04818c
Refactor visualization as pipeline block
blaz-r Jul 26, 2023
ab794eb
Refactor postprocessing into a pipeline
blaz-r Jul 26, 2023
fa21c83
Add normalization and thresholding on joined predictions
blaz-r Jul 26, 2023
6697389
Refactor tiler to accept config file
blaz-r Jul 27, 2023
fbb593f
Add smoothing of tile joins.
blaz-r Jul 27, 2023
1e1939c
Refactor ensemble datamodule preparation
blaz-r Jul 27, 2023
cfebf9b
Remove unused changes in dataloader
blaz-r Jul 28, 2023
b2e3a01
Fix metric configuration
blaz-r Jul 28, 2023
8941a6e
Fix box coordinates in joining
blaz-r Jul 28, 2023
87adbd7
Add ensemble callbacks preparation function
blaz-r Jul 28, 2023
c7c537c
Fix box prediction bug in postprocess
blaz-r Jul 28, 2023
8f103de
Add ensemble params to config
blaz-r Jul 31, 2023
9c7662a
Refactor postprocessing.
blaz-r Jul 31, 2023
adf281f
Refactor post-processing
blaz-r Aug 1, 2023
d6ca860
Refactor predictions
blaz-r Aug 1, 2023
24568f3
Code cleanup
blaz-r Aug 1, 2023
02dc356
Optimize prediction storage
blaz-r Aug 1, 2023
770fe34
Make join smoothing configurable
blaz-r Aug 1, 2023
4c45fe6
Cleanup before PR
blaz-r Aug 1, 2023
6440bd0
Fix stats pipeline
blaz-r Aug 1, 2023
a5bc6cb
Fix logging strings
blaz-r Aug 2, 2023
bb5b125
Fix memory benchmark
blaz-r Aug 2, 2023
2e1840c
Fix tiler issues
blaz-r Aug 2, 2023
057528d
Fix import issues
blaz-r Aug 2, 2023
2b3c4f2
Fix naming in metrics and visualization
blaz-r Aug 2, 2023
f070979
Fix cyclic import
blaz-r Aug 2, 2023
5ab22b7
Make logging lazy
blaz-r Aug 2, 2023
69ccc65
Refactor tiler tests
blaz-r Aug 2, 2023
1fbc67d
Added collate tiling tests
blaz-r Aug 2, 2023
d431637
Added ensemble helper functions tests
blaz-r Aug 2, 2023
c875391
Refactor for dummy ensemble config
blaz-r Aug 2, 2023
29a311a
Refactor for dummy base config
blaz-r Aug 3, 2023
4b0ce71
Add tests for prediction storage
blaz-r Aug 3, 2023
1a0bb1e
Add tests for prediction joiner
blaz-r Aug 3, 2023
4fbf89a
Add tests for visualization
blaz-r Aug 4, 2023
9f1ebe6
Fix small issues in tests
blaz-r Aug 4, 2023
4f9c16f
Add metrics test
blaz-r Aug 4, 2023
333ab23
Add post-processing tests
blaz-r Aug 4, 2023
9719044
Fix tiler to work with different instance
blaz-r Aug 4, 2023
cf9a77a
Move seed setting inside train loop
blaz-r Aug 5, 2023
7d9f670
Fix pipeline stats bug
blaz-r Aug 5, 2023
d917b7f
Rename ensemble config fixture
blaz-r Aug 5, 2023
d885642
Add pipeline tests
blaz-r Aug 5, 2023
9b21e0b
Fix config in pipeline tests
blaz-r Aug 5, 2023
89fade8
Add training script test
blaz-r Aug 5, 2023
9d5371e
Fix types and docstrings
blaz-r Aug 5, 2023
8e19108
Move and rename to tiled_ensemble
blaz-r Aug 7, 2023
866e3de
Fix bug in label joining.
blaz-r Aug 7, 2023
08aef87
Remove memory benchmark
blaz-r Aug 7, 2023
6d37c60
Cleanup files
blaz-r Aug 7, 2023
5cfcb4a
Fix metrics setup
blaz-r Aug 8, 2023
dbfe1c6
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 10, 2023
5845f5b
Rename collate function
blaz-r Aug 10, 2023
6bea6e7
Add license to test files
blaz-r Aug 10, 2023
25d22a4
Rename fixtures
blaz-r Aug 10, 2023
6055f68
Add more comments to tiled ensemble training
blaz-r Aug 10, 2023
9b4aa43
Add start of training log message
blaz-r Aug 10, 2023
8d4b97d
Refactor tiler to have explicit arguments
blaz-r Aug 10, 2023
0a25baa
Refactor pred. storage to have explicit arguments
blaz-r Aug 10, 2023
5bc68f9
Refactor metrics to have explicit arguments
blaz-r Aug 10, 2023
262a8a1
Refactor visualization to have explicit arguments
blaz-r Aug 10, 2023
73a1960
Refactor post-processing to have explicit arguments
blaz-r Aug 10, 2023
5c1a115
Sort imports
blaz-r Aug 10, 2023
8a3b625
Add test ensemble script
blaz-r Aug 10, 2023
89275b9
Fix join smoothing bug
blaz-r Aug 10, 2023
78e71e2
Add more documentation to doc-strings
blaz-r Aug 11, 2023
c3720f8
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 11, 2023
0143db9
Remove unused import
blaz-r Aug 11, 2023
c199494
Add brief tiled ensemble documentation
blaz-r Aug 11, 2023
1976d4e
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 15, 2023
33f1125
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 17, 2023
ae3ac29
Update typehints
blaz-r Aug 17, 2023
1f5a4a7
Make training args more clear
blaz-r Aug 17, 2023
09c0365
Revert addition of no threshold option.
blaz-r Aug 17, 2023
5eb735f
Refactor normalization and threshold config
blaz-r Aug 17, 2023
5cf3ea0
Remove tiled ensemble from docs index
blaz-r Aug 17, 2023
11cbab6
Add comments to clarify parts of ensemble config
blaz-r Aug 17, 2023
b5bb093
Improve ensemble config comments
blaz-r Aug 17, 2023
f99a070
Add num_tiles attribute to tiler.
blaz-r Aug 17, 2023
1f55cb7
Fix metrics process docstring
blaz-r Aug 17, 2023
2ff7281
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 17, 2023
7a083a6
Fix visualization bug and cover with test
blaz-r Aug 17, 2023
16a28a4
Merge remote-tracking branch 'origin/ensemble' into ensemble
blaz-r Aug 17, 2023
17e40ff
Replace strings with enum
blaz-r Aug 17, 2023
b47f2dd
Improve comments in joiner.
blaz-r Aug 17, 2023
6dba87e
Fix bug when model doesn't have anomaly maps.
blaz-r Aug 17, 2023
6fab1a5
Improve docstrings (types, clarify).
blaz-r Aug 17, 2023
97e9cb6
Fix visualization tests
blaz-r Aug 17, 2023
b2c1709
Fix dict membership checks
blaz-r Aug 18, 2023
811c4e8
Add saving of ensemble config file
blaz-r Aug 19, 2023
1ab91ac
Update test script args
blaz-r Aug 19, 2023
808698f
Cover test script with tests
blaz-r Aug 19, 2023
b21129c
Update export warning
blaz-r Aug 19, 2023
fc507c4
Fix case when no test or val data
blaz-r Aug 21, 2023
59ca549
Improve documentation images
blaz-r Aug 21, 2023
d0ded2e
Add images for documentation
blaz-r Aug 21, 2023
9b6b41e
Add codacy suggestion
blaz-r Aug 21, 2023
6d4b0d3
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 21, 2023
fb6ce60
Merge branch 'main' into ensemble
samet-akcay Aug 22, 2023
3b1aa46
Refactor joiner to single class
blaz-r Aug 22, 2023
bfa4aa2
Refactor storage names and config
blaz-r Aug 22, 2023
a7b6860
Update normalization and threshold stage names
blaz-r Aug 22, 2023
82cfc00
Merge remote-tracking branch 'origin/ensemble' into ensemble
blaz-r Aug 22, 2023
46d9ed8
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 31, 2023
e8deacd
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Sep 4, 2023
ceb0bd9
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Sep 5, 2023
c53cb53
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Sep 22, 2023
8ae4c20
Merge branch 'main' into ensemble
blaz-r Sep 26, 2023
ce4d707
Merge branch 'main' into ensemble
blaz-r Oct 26, 2023
640d630
Merge remote-tracking branch 'origin/main' into ensemble
blaz-r Jun 11, 2024
d1d3763
Add transforms independent input size to models
blaz-r Jun 12, 2024
bb04a3b
Make collate function a datamodule attribute
blaz-r Jun 12, 2024
658ee60
Refactor tiled ensemble train into pipeline step
blaz-r Jun 12, 2024
aecf837
Refactor tiled ensemble prediction into pipeline step
blaz-r Jun 13, 2024
15eae34
Refactor tiled ensemble merging into pipeline step
blaz-r Jun 13, 2024
c00f6a1
Refactor tiled ensemble seam smoothing into pipeline step
blaz-r Jun 14, 2024
f7ee730
Refactor tiled stats calculation into pipeline step
blaz-r Jun 14, 2024
9d1f141
Fix ckpt loading when predicting on test set.
blaz-r Jun 14, 2024
6570156
Add logging and add tqdm to pipeline steps.
blaz-r Jun 14, 2024
e49dbfe
Refactor normalization pipeline step
blaz-r Jun 14, 2024
affc8ef
Refactor thresholding into new pipeline job
blaz-r Jun 15, 2024
b934c68
Fix transforms issue when predicting with dataloader
blaz-r Jun 15, 2024
c0791be
Add visualization as new pipeline step
blaz-r Jun 15, 2024
0dac5ed
Add metrics as new pipeline step
blaz-r Jun 15, 2024
fedaddb
Format the code and address some lint problems
blaz-r Jun 15, 2024
3548a50
Add code to skip test if test split is none
blaz-r Jun 15, 2024
551d38d
Add accelerator to metrics and smoothing
blaz-r Jun 15, 2024
d6834d8
Make threshold acq helper function and add to threshold to metrics
blaz-r Jun 15, 2024
2b45cd2
Make a separate test pipeline
blaz-r Jun 15, 2024
9713112
Restructure tiled ensemble files into directories
blaz-r Jun 15, 2024
dac985f
Pipeline code cleanup
blaz-r Jun 15, 2024
b59fcbf
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 2, 2024
aca61df
Remove old tiled ensemble files
blaz-r Aug 2, 2024
27824df
Remove old post processing files
blaz-r Aug 2, 2024
0f935a7
Fix sigma value read in smoothing
blaz-r Aug 2, 2024
5ce8480
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Aug 18, 2024
fb0f6e7
Update stats calc and normalization
blaz-r Aug 18, 2024
6db23cc
Update args naming convention
blaz-r Aug 18, 2024
15db3de
Refactor code for nice config
blaz-r Aug 18, 2024
317c942
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Sep 25, 2024
404843b
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Sep 26, 2024
8acae31
Update docs structure for new system
blaz-r Sep 28, 2024
348a0a6
Cleanup train code
blaz-r Sep 29, 2024
d312563
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Oct 3, 2024
e427de3
Fix test script args
blaz-r Oct 3, 2024
2d7a825
Update box merging
blaz-r Oct 3, 2024
08986e1
Refactor helper function tests
blaz-r Oct 3, 2024
1afaefc
Small changes in helper and engine
blaz-r Oct 3, 2024
eaec7bc
Refactor merging tests
blaz-r Oct 3, 2024
5b27f95
Refactor tiling tests
blaz-r Oct 3, 2024
8f61406
Refactor metrics test
blaz-r Oct 3, 2024
bd47db7
Add support for different threshold methods
blaz-r Oct 3, 2024
4136d17
Format tests
blaz-r Oct 3, 2024
c90a9b9
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Oct 4, 2024
17759db
Change test to predict
blaz-r Oct 4, 2024
2ad4b58
Refactor stats calculation tests
blaz-r Oct 4, 2024
5e83cde
Refactor prediction data tests
blaz-r Oct 4, 2024
ac59dcc
Update metrics tests
blaz-r Oct 4, 2024
eaf5c5f
Move metrics tests to components
blaz-r Oct 5, 2024
04f932f
Refactor seam smoothing tests
blaz-r Oct 5, 2024
f4f7c63
Refactor normalization tests
blaz-r Oct 5, 2024
e434ed3
Move mock stats to conftest
blaz-r Oct 5, 2024
7b2dc2c
Fix typehints for generator
blaz-r Oct 5, 2024
8603a0f
Refactor threshold tests
blaz-r Oct 5, 2024
42965dc
Temporarily disable box minmax
blaz-r Oct 5, 2024
2c440c4
Add tiled ensemble integration test
blaz-r Oct 5, 2024
76e4c50
Fix normalization tests and add additional merging test
blaz-r Oct 5, 2024
491a78c
Add tile collater tests
blaz-r Oct 5, 2024
45d720c
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Oct 6, 2024
17ad0d9
Change dataset in tests to dummy
blaz-r Oct 6, 2024
424c99a
Format and fix linter errors
blaz-r Oct 6, 2024
cb99522
Format and some cleanup
blaz-r Oct 6, 2024
df3094f
Rename predict to eval
blaz-r Oct 6, 2024
faf3c6d
Update docs for refactored version of code
blaz-r Oct 6, 2024
dd72f4f
Cleanup the docs
blaz-r Oct 7, 2024
9b3f6ee
Merge branch 'main' into ensemble
samet-akcay Oct 14, 2024
78ebcd1
Merge branch 'main' into ensemble
samet-akcay Oct 22, 2024
d078c5b
Merge branch 'openvinotoolkit:main' into ensemble
blaz-r Oct 22, 2024
7ca68ed
Update ensemble engine
blaz-r Oct 22, 2024
e6c75d8
Remove boxes from pipelines and tests
blaz-r Oct 22, 2024
fb15eaf
Fix TODO comment issue
blaz-r Oct 22, 2024
9f9fc23
Fix unused model in ens. engine
blaz-r Oct 22, 2024
bdf0df9
Fix path case in test
blaz-r Oct 22, 2024
314662f
Change temporary dir to project_path
blaz-r Oct 22, 2024
8e3d9d5
Change mvtec to MVTec in test path
blaz-r Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
254 changes: 254 additions & 0 deletions docs/source/markdown/guides/how_to/pipelines/custom_pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Pipelines

This guide demonstrates how to create a [Pipeline](../../reference/pipelines/index.md) for your custom task.

A pipeline is made up of runners. These runners are responsible for running a single type of job. A job is the smallest unit of work that is independent, such as, training a model or statistical comparison of the outputs of two models. Each job should be designed to be independent of other jobs so that they are agnostic to the runner that is running them. This ensures that the job can be run in parallel or serially without any changes to the job itself. The runner does not directly instantiate a job but rather has a job generator that generates the job based on the configuration. This generator is responsible for parsing the config and generating the job.

## Birds Eye View

In this guide we are going to create a dummy significant parameter search pipeline. The pipeline will have two jobs. The first job trains a model and computes the metric. The second job computes the significance of the parameters to the final score using shapely values. The final output of the pipeline is a plot that shows the contribution of each parameter to the final score. This will help teach you how to create a pipeline, a job, a job generator, and how to expose it to the `anomalib` CLI. The pipeline is going to be named `experiment`. So by the end of this you will be able to generate significance plot using

```{literalinclude} ../../../../snippets/pipelines/dummy/anomalib_cli.txt
:language: bash
```

The final directory structure will look as follows:

```{literalinclude} ../../../../snippets/pipelines/dummy/src_dir_structure.txt

```

```{literalinclude} ../../../../snippets/pipelines/dummy/tools_dir_structure.txt
:language: bash
```

## Creating the Jobs

Let's first look at the base class for the [jobs](../../reference/pipelines/base/job.md). It has a few methods defined.

- The `run` method is the main method that is called by the runner. This is where we will train the model and return the model metrics.
- The `collect` method is used to gather the results from all the runs and collate them. This is handy as we want to pass a single object to the next job that contains details of all the runs including the final score.
- The `save` method is used to write any artifacts to the disk. It accepts the gathered results as a parameter. This is useful in a variety of situations. Say, when we want to write the results in a csv file or write the raw anomaly maps for further processing.

Let's create the first job that trains the model and computes the metric. Since it is a dummy example, we will just return a random number as the metric.

```python
class TrainJob(Job):
name = "train"

def __init__(self, lr: float, backbone: str, stride: int):
self.lr = lr
self.backbone = backbone
self.stride = stride

def run(self, task_id: int | None = None) -> dict:
print(f"Training with lr: {self.lr}, backbone: {self.backbone}, stride: {self.stride}")
time.sleep(2)
score = np.random.uniform(0.7, 0.1)
return {"lr": self.lr, "backbone": self.backbone, "stride": self.stride, "score": score}
```

Ignore the `task_id` for now. It is used for parallel jobs. We will come back to it later.

````{note}
The `name` attribute is important and is used to identify the arguments in the job config file.
So, in our case the config `yaml` file will contain an entry like this:

```yaml
...
train:
lr:
backbone:
stride:
...
````

Of course, it is up to us to choose what parameters should be shown under the `train` key.

Let's also add the `collect` method so that we return a nice dict object that can be used by the next job.

```python
def collect(results: list[dict]) -> dict:
output: dict = {}
for key in results[0]:
output[key] = []
for result in results:
for key, value in result.items():
output[key].append(value)
return output
```

We can also define a `save` method that writes the dictionary as a csv file.

```python
@staticmethod
def save(results: dict) -> None:
"""Save results in a csv file."""
results_df = pd.DataFrame(results)
file_path = Path("runs") / TrainJob.name
file_path.mkdir(parents=True, exist_ok=True)
results_df.to_csv(file_path / "results.csv", index=False)
```

The entire job class is shown below.

```{literalinclude} ../../../../snippets/pipelines/dummy/train_job.txt
:language: python
```

Now we need a way to generate this job when the pipeline is run. To do this we need to subclass the [JobGenerator](../../reference/pipelines/base/generator.md) class.

The job generator is the actual object that is attached to a runner and is responsible for parsing the configuration and generating jobs. It has two methods that need to be implemented.

- `generate_job`: This method accepts the configuration as a dictionary and, optionally, the results of the previous job. For the train job, we don't need results for previous jobs, so we will ignore it.
- `job_class`: This holds the reference to the class of the job that the generator will yield. It is used to inform the runner about the job that is being run, and is used to access the static attributes of the job such as its name, collect method, etc.

Let's first start by defining the configuration that the generator will accept. The train job requires three parameters: `lr`, `backbone`, and `stride`. We will also add another parameter that defines the number of experiments we want to run. One way to define it would be as follows:

```yaml
train:
experiments: 10
lr: [0.1, 0.99]
backbone:
- resnet18
- wide_resnet50
stride:
- 3
- 5
```

For this example the specification is defined as follows.

1. The number of experiments is set to 10.
2. Learning rate is sampled from a uniform distribution in the range `[0.1, 0.99]`.
3. The backbone is chosen from the list `["resnet18", "wide_resnet50"]`.
4. The stride is chosen from the list `[3, 5]`.

```{note}
While the `[ ]` and `-` syntax in `yaml` both signify a list, for visual disambiguation this example uses `[ ]` to denote closed interval and `-` for a list of options.
```

With this defined, we can define the generator class as follows.

```{literalinclude} ../../../../snippets/pipelines/dummy/train_generator.txt
:language: python
```

Since this is a dummy example, we generate the next experiment randomly. In practice, you would use a more sophisticated method that relies on your validation metrics to generate the next experiment.

```{admonition} Challenge
:class: tip
For a challenge define your own configuration and a generator to parse that configuration.
```

Okay, so now we can train the model. We still need a way to find out which parameters contribute the most to the final score. We will do this by computing the shapely values to find out the contribution of each parameter to the final score.

Let's first start by adding the library to our environment

```bash
pip install shap
```

The following listing shows the job that computes the shapely values and saves a plot that shows the contribution of each parameter to the final score. A quick rundown without going into the details of the job (as it is irrelevant to the pipeline) is as follows. We create a `RandomForestRegressor` that is trained on the parameters to predict the final score. We then compute the shapely values to identify the parameters that have the most significant impact on the model performance. Finally, the `save` method saves the plot so we can visually inspect the results.

```{literalinclude} ../../../../snippets/pipelines/dummy/significance_job.txt

```

Great! Now we have the job, as before, we need the generator. Since we only need the results from the previous stage, we don't need to define the config. Let's quickly write that as well.

```{literalinclude} ../../../../snippets/pipelines/dummy/significance_job_generator.txt

```

## Experiment Pipeline

So now we have the jobs, and a way to generate them. Let's look at how we can chain them together to achieve what we want. We will use the [Pipeline](../../reference/pipelines/base/pipeline.md) class to define the pipeline.

When creating a custom pipeline, there is only one important method that we need to implement. That is the `_setup_runners` method. This is where we chain the runners together.

```{literalinclude} ../../../../snippets/pipelines/dummy/pipeline_serial.txt
:language: python
```

In this example we use `SerialRunner` for running each job. It is a simple runner that runs the jobs in a serial manner. For more information on `SerialRunner` look [here](../../reference/pipelines/runners/serial.md).

Okay, so we have the pipeline. How do we run it? To do this let's create a simple entrypoint in `tools` folder of Anomalib.

Here is how the directory looks.

```{literalinclude} ../../../../snippets/pipelines/dummy/tools_dir_structure.txt
:language: bash
```

As you can see, we have the `config.yaml` file in the same directory. Let's quickly populate `experiment.py`.

```python
from anomalib.pipelines.experiment_pipeline import ExperimentPipeline

if __name__ == "__main__":
ExperimentPipeline().run()
```

Alright! Time to take it on the road.

```bash
python tools/experimental/experiment/experiment.py --config tools/experimental/experiment/config.yaml
```

If all goes well you should see the summary plot in `runs/significant_feature/summary_plot.png`.

## Exposing to the CLI

Now that you have your shiny new pipeline, you can expose it as a subcommand to `anomalib` by adding an entry to the pipeline registry in `anomalib/cli/pipelines.py`.

```python
if try_import("anomalib.pipelines"):
...
from anomalib.pipelines import ExperimentPipeline

PIPELINE_REGISTRY: dict[str, type[Pipeline]] | None = {
"experiment": ExperimentPipeline,
...
}
```

With this you can now call

```{literalinclude} ../../../../snippets/pipelines/dummy/anomalib_cli.txt
:language: bash
```

Congratulations! You have successfully created a pipeline that trains a model and computes the significance of the parameters to the final score 🎉

```{admonition} Challenge
:class: tip
This example used a random model hence the scores were meaningless. Try to implement a real model and compute the scores. Look into which parameters lead to the most significant contribution to your score.
```

## Final Tweaks

Before we end, let's look at a few final tweaks that you can make to the pipeline.

First, let's run the initial model training in parallel. Since all jobs are independent, we can use the [ParallelRunner](../../reference/pipelines/runners/parallel.md). Since the `TrainJob` is a dummy job in this example, the pool of parallel jobs is set to the number of experiments.

```{literalinclude} ../../../../snippets/pipelines/dummy/pipeline_parallel.txt

```

You now notice that the entire pipeline takes lesser time to run. This is handy when you have large number of experiments, and when each job takes substantial time to run.

Now on to the second one. When running the pipeline we don't want our terminal cluttered with the outputs from each run. Anomalib provides a handy decorator that temporarily hides the output of a function. It suppresses all outputs to the standard out and the standard error unless an exception is raised. Let's add this to the `TrainJob`

```python
from anomalib.utils.logging import hide_output

class TrainJob(Job):
...

@hide_output
def run(self, task_id: int | None = None) -> dict:
...
```

You will no longer see the output of the `print` statement in the `TrainJob` method in the terminal.
Loading
Loading