🚀 LLM Foundry v0.10.0

New Features

Registry for ICL datasets (#1252)

ICL datasets have now been added as a registry.

Curriculum Learning Callback (#1256)

You can now switch dataloaders while training which enables curriculum learning.

train_loader:
  <dataloader parameters>
callback:
  curriculum_learning:
  - duration: <number>tok
    train_loader:  # matches top level train_loader
      <dataloader parameters>
  - duration: <number>tok
    train_loader:
      <dataloader parameters>
  - duration: <number>tok
    train_loader:
      <dataloader parameters>

[Experimental] Interweave Attention Layers (#1299)

You can now override default block configs for certain layers, allowing for different sliding window sizes, reusing the previous layer's kv cache, etc.

model:
    ...
    (usual model configs)
    ...
    block_overrides:
        order:
        - name: default
        - order:
          - name: sliding_window_layer
          - name: sliding_window_layer_reuse
          - name: sliding_window_layer
          - repeat: 2
            name: sliding_window_layer_reuse
          - name: reuse_kv_layer
          repeat: 2
        overrides:
            sliding_window_layer:
                attn_config:
                    sliding_window_size: 1024
            sliding_window_layer_reuse:
                attn_config:
                    sliding_window_size: 1024
                    reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
            reuse_kv_layer:
                attn_config:
                    reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse

Bug fixes

Fix packing + streaming + resumption by @dakinggg in #1277

What's Changed

Bump Version to 0.10.0.dev0 by @KuuCi in #1255
Fix typo in setup.py by @XiaohanZhangCMU in #1263
Update TE Dockerfile by @j316chuck in #1265
Revert "Update TE Dockerfile (#1265)" by @j316chuck in #1266
Revert to older TE version by @mvpatel2000 in #1267
Bump Composer to version 0.23.2 by @dakinggg in #1269
fix linting by @milocress in #1270
Add torch 2.3.1 docker images by @dakinggg in #1275
Make expandable segments on by default by @b-chu in #1278
Adds CI for torch 2.3.1 by @dakinggg in #1281
Update README.md to use variables by @milocress in #1282
Add registry for ICL datasets by @sanjari-orb in #1252
Fix typo in CI by @dakinggg in #1284
Fix backwards compatibility for ICL arg by @dakinggg in #1286
Fix packing + streaming + resumption by @dakinggg in #1277
Dbfs HF by @KuuCi in #1214
Bump mlflow to 2.13.2 by @KuuCi in #1285
Add missing dependency group by @dakinggg in #1287
Update Dockerfile with TE main by @j316chuck in #1273
Fix TE HF checkpoint saving by @j316chuck in #1280
added systemMetricsMonitor callback by @JackZ-db in #1260
Extendability refactors by @dakinggg in #1290
Small refactor for update batch size by @dakinggg in #1293
Bump min composer version to 0.23.3 by @dakinggg in #1294
Fix grad accum typing by @dakinggg in #1296
Bump composer to 0.23.4 by @mvpatel2000 in #1297
Allow passing in lbl_process_group directly by @dakinggg in #1298
Add all transforms to train script by @dakinggg in #1300
Add Retries to run_query by @KuuCi in #1302
Bumping mlflow version to include buffering by @JackZ-db in #1303
Ignore mosaicml logger for exception if excephook is active by @jjanezhang in #1301
Add curriculum learning callback by @b-chu in #1256
Avoid circular import in hf checkpointer by @dakinggg in #1304
Remove codeql workflow by @dakinggg in #1305
Update CI test to v0.0.8 by @KuuCi in #1306
Upgrade ci testing to 0.0.8 by @dakinggg in #1307
Bump ci-testing to 0.0.9 by @dakinggg in #1310
Fix 4 gpu tests by @dakinggg in #1311
Bump recommended images to 2.3.1 and remove 2.3.0 CI by @dakinggg in #1312
Provide default seed value in TrainConfig, matching EvalConfig by @mvpatel2000 in #1315
Refactor hf checkpointer for config transformations by @irenedea in #1318
Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. by @ShashankMosaicML in #1299
Add optional logging of text output to EvalOutputLogging by @sjawhar in #1283

New Contributors

@sanjari-orb made their first contribution in #1252
@JackZ-db made their first contribution in #1260
@sjawhar made their first contribution in #1283

Full Changelog: v0.9.1...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.0