Supporting other data types (e.g. video) #53

shakibyzn · 2024-02-06T14:42:35Z

Hi,

Is it possible to use Mammoth for other seq2seq problems, such as multilingual video/image captioning? What I have in mind is to prepare video features in this format (batch, n_frames, emb_size) and read them using src_embeddings in the training script while keeping the rest unchanged in the config.yml (e.g., tgt_vocab).

TimotheeMickus · 2024-03-01T10:13:52Z

Hi!
Apologies for the late answer.

Other input types are currently not implemented, although we've had this request more than once. We don't really have enough hands to look into it properly; hence up until now, we've focused on text-only applications.

It should however be feasible with a reasonably small amount of changes to the codebase, depending what you're exactly looking for. If a hacky solution and vanilla transformer layers are good enough, then you can try the following to train a model:

override the function for file reading here:

mammoth/mammoth/inputters/dataset.py

Line 33 in c6a193b

def read_examples_from_files(

and in particular adapt the closure _make_dict to retrieve your data properly:

mammoth/mammoth/inputters/dataset.py

Lines 43 to 51 in c6a193b

    
           def _make_example_dict(packed): 
        
               """Helper function to convert lines to dicts""" 
        
               src_str, tgt_str = packed 
        
               return { 
        
                   'src': tokenize_fn(src_str, side='src'), 
        
                   'tgt': tokenize_fn(tgt_str, side='tgt') if tgt_str is not None else None, 
        
                   # 'align': None, 
        
               }

How to do that concretely depends on how your data is formatted.

tweak the data collator function here:

mammoth/mammoth/inputters/dataset.py

Line 157 in c6a193b

def collate_fn(self, examples):

Batched tensors are expected to be sequence first (so your features would need to end up in the format (n_frames, batch_size, model_dim).)
turn off mapping input tokens to embeddings; cf. here for the encoder if I'm not mistaken:

mammoth/mammoth/modules/layer_stack_encoder.py

Lines 92 to 93 in c6a193b

emb = self.embeddings(src)

emb = emb.transpose(0, 1).contiguous()
use a default sentence-level batching function (--batch_type sents). You might need to also pass some dummy variables for the source vocab or comment out the relevant section.

If you'd like to have a look at doing that more properly, external contributions are very much welcome!

shakibyzn · 2024-03-12T11:20:57Z

Thank you for your detailed reply. I've been working on this for a month now and I'm able to train it properly. I haven't had a lot of experiments with that and I'm trying different hyperparameters to see if I can get to an acceptable performance. One question, isn't it possible to use loss as the criterion for early stopping?

TimotheeMickus · 2024-03-12T11:52:23Z

Yes, early stopping should be supported out of the box.

Assuming you want to evaluate every 10k steps, and stop training if it no longer improves after 5 evaluation loops, then:

provide a path_valid_src and a path_valid_tgt in your task definitions for enabling validation loops
include the following to your YAML config:

early_stopping: 5
early_stopping_criteria: ppl
valid_steps: 10000

The early stopper will evaluate perplexity on the validation dataset(s), which should be equivalent to the cross-entropy used to train the model.
If you need something finer-grained than that, then you can implement a Scorer object:

mammoth/mammoth/utils/earlystopping.py

Line 11 in 1ba7af9

class Scorer(object):

and then make sure to make it available as one of the default scorers and scorer builders:

mammoth/mammoth/utils/earlystopping.py

Lines 60 to 63 in 1ba7af9

    
           DEFAULT_SCORERS = [PPLScorer(), AccuracyScorer()] 
        
           SCORER_BUILDER = {"ppl": PPLScorer, "accuracy": AccuracyScorer}

shakibyzn · 2024-03-12T12:50:44Z

Thank you.

TimotheeMickus · 2024-03-14T11:47:05Z

No worries! don't hesitate to share your code if you want us to include it in the library, we would welcome a pull request if you have something working.

TimotheeMickus added the question Further information is requested label Mar 1, 2024

TimotheeMickus mentioned this issue Mar 8, 2024

Opts need a major cleanup #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting other data types (e.g. video) #53

Supporting other data types (e.g. video) #53

shakibyzn commented Feb 6, 2024

TimotheeMickus commented Mar 1, 2024 •

edited

Loading

shakibyzn commented Mar 12, 2024

TimotheeMickus commented Mar 12, 2024

shakibyzn commented Mar 12, 2024

TimotheeMickus commented Mar 14, 2024

Supporting other data types (e.g. video) #53

Supporting other data types (e.g. video) #53

Comments

shakibyzn commented Feb 6, 2024

TimotheeMickus commented Mar 1, 2024 • edited Loading

shakibyzn commented Mar 12, 2024

TimotheeMickus commented Mar 12, 2024

shakibyzn commented Mar 12, 2024

TimotheeMickus commented Mar 14, 2024

TimotheeMickus commented Mar 1, 2024 •

edited

Loading