Obtaining Firefox Models

Prerequisites

  1. Make sure you have git-lfs installed
  git lfs install

Clone the repository

  git clone https://github.com/mozilla/firefox-translations-models

Note: This will take a long time as git-lfs needs to pull all the models (~1.5 GB) from remote.

Once the repository has been cloned, you'll find the models in firefox-translations-models/models like so:

firefox-translations-models/models/
├── dev
│   ├── bsen
│   │   ├── lex.50.50.bsen.s2t.bin.gz
│   │   ├── model.bsen.intgemm.alphas.bin.gz
│   │   └── vocab.bsen.spm.gz
|   <... Omitted for brevity ...> 
│   └── nnen
│       ├── lex.50.50.nnen.s2t.bin.gz
│       ├── model.nnen.intgemm.alphas.bin.gz
│       └── vocab.nnen.spm.gz
└── prod
    ├── bgen
    │   ├── lex.50.50.bgen.s2t.bin.gz
    │   ├── model.bgen.intgemm.alphas.bin.gz
    │   └── vocab.bgen.spm.gz
    <... Omitted for brevity ...> 
    └── zhen
        ├── lex.50.50.zhen.s2t.bin.gz
        ├── model.zhen.intgemm.alphas.bin.gz
        └── vocab.zhen.spm.gz

71 directories, 206 files

Flatten the directory structure

We need all the models, vocabs and shortlists in a single directory. We also need a registry.json associating models, vocabs and shortlists with language pairs.

Use this github gist to flatten the folder structure and also to generate the registry.json file. save that bash script into firefox-translations-models/models/process-firefox-translation-models.sh

  cd firefox-translations-models/models/
  chmod +x process-firefox-translation-models.sh
  ./process-firefox-translation-models.sh

This should have created a new directory called flattened with all the models, vocabs and shortlists and a file outside this directory called registry.json.

Verify your directory and registry.json

Go through the directory and registry.json and check if everything looks valid.

Create the archive

Archiving using these settings will take a long time. change the tar step if you wish to do a quicker compress.

  mv flattened/ firefox/ # rename the directory
  mv registry.json firefox/
  cp ../LICENSE firefox/ # copy MPL-2.0 into the archive direct
  time tar -Jcvf firefox-models.tar.xz firefox/

The tar step took about 30 mins on my thinkpad x270.

You'll find the archive ready in firefox-models.tar.xz.