v0.10.0
🚀 LLM Foundry v0.10.0
New Features
Registry for ICL datasets (#1252)
ICL datasets have now been added as a registry.
Curriculum Learning Callback (#1256)
You can now switch dataloaders while training which enables curriculum learning.
train_loader:
<dataloader parameters>
callback:
curriculum_learning:
- duration: <number>tok
train_loader: # matches top level train_loader
<dataloader parameters>
- duration: <number>tok
train_loader:
<dataloader parameters>
- duration: <number>tok
train_loader:
<dataloader parameters>
[Experimental] Interweave Attention Layers (#1299)
You can now override default block configs for certain layers, allowing for different sliding window sizes, reusing the previous layer's kv cache, etc.
model:
...
(usual model configs)
...
block_overrides:
order:
- name: default
- order:
- name: sliding_window_layer
- name: sliding_window_layer_reuse
- name: sliding_window_layer
- repeat: 2
name: sliding_window_layer_reuse
- name: reuse_kv_layer
repeat: 2
overrides:
sliding_window_layer:
attn_config:
sliding_window_size: 1024
sliding_window_layer_reuse:
attn_config:
sliding_window_size: 1024
reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
reuse_kv_layer:
attn_config:
reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse
Bug fixes
What's Changed
- Bump Version to 0.10.0.dev0 by @KuuCi in #1255
- Fix typo in setup.py by @XiaohanZhangCMU in #1263
- Update TE Dockerfile by @j316chuck in #1265
- Revert "Update TE Dockerfile (#1265)" by @j316chuck in #1266
- Revert to older TE version by @mvpatel2000 in #1267
- Bump Composer to version 0.23.2 by @dakinggg in #1269
- fix linting by @milocress in #1270
- Add torch 2.3.1 docker images by @dakinggg in #1275
- Make expandable segments on by default by @b-chu in #1278
- Adds CI for torch 2.3.1 by @dakinggg in #1281
- Update README.md to use variables by @milocress in #1282
- Add registry for ICL datasets by @sanjari-orb in #1252
- Fix typo in CI by @dakinggg in #1284
- Fix backwards compatibility for ICL arg by @dakinggg in #1286
- Fix packing + streaming + resumption by @dakinggg in #1277
- Dbfs HF by @KuuCi in #1214
- Bump mlflow to 2.13.2 by @KuuCi in #1285
- Add missing dependency group by @dakinggg in #1287
- Update Dockerfile with TE main by @j316chuck in #1273
- Fix TE HF checkpoint saving by @j316chuck in #1280
- added systemMetricsMonitor callback by @JackZ-db in #1260
- Extendability refactors by @dakinggg in #1290
- Small refactor for update batch size by @dakinggg in #1293
- Bump min composer version to 0.23.3 by @dakinggg in #1294
- Fix grad accum typing by @dakinggg in #1296
- Bump composer to 0.23.4 by @mvpatel2000 in #1297
- Allow passing in lbl_process_group directly by @dakinggg in #1298
- Add
all
transforms to train script by @dakinggg in #1300 - Add Retries to run_query by @KuuCi in #1302
- Bumping mlflow version to include buffering by @JackZ-db in #1303
- Ignore mosaicml logger for exception if excephook is active by @jjanezhang in #1301
- Add curriculum learning callback by @b-chu in #1256
- Avoid circular import in hf checkpointer by @dakinggg in #1304
- Remove codeql workflow by @dakinggg in #1305
- Update CI test to v0.0.8 by @KuuCi in #1306
- Upgrade ci testing to 0.0.8 by @dakinggg in #1307
- Bump ci-testing to 0.0.9 by @dakinggg in #1310
- Fix 4 gpu tests by @dakinggg in #1311
- Bump recommended images to 2.3.1 and remove 2.3.0 CI by @dakinggg in #1312
- Provide default seed value in TrainConfig, matching EvalConfig by @mvpatel2000 in #1315
- Refactor hf checkpointer for config transformations by @irenedea in #1318
- Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. by @ShashankMosaicML in #1299
- Add optional logging of text output to EvalOutputLogging by @sjawhar in #1283
New Contributors
- @sanjari-orb made their first contribution in #1252
- @JackZ-db made their first contribution in #1260
- @sjawhar made their first contribution in #1283
Full Changelog: v0.9.1...v0.10.0