Obtaining Firefox Models (for packagers only)

Note: The instructions below are meant for packagers.

Prerequisites

  1. Make sure you have git-lfs installed
  git lfs install

Clone the repository

  git clone https://github.com/mozilla/firefox-translations-models

Note: This will take a long time as git-lfs needs to pull all the models (~1.5 GB) from remote.

Once the repository has been cloned, you'll find the models in firefox-translations-models/models like so:

firefox-translations-models/models/
├── tiny
│   ├── bsen
│   │   ├── lex.50.50.bsen.s2t.bin.gz
│   │   ├── model.bsen.intgemm.alphas.bin.gz
│   │   └── vocab.bsen.spm.gz
|   <... Omitted for brevity ...> 
│   └── nnen
│       ├── lex.50.50.nnen.s2t.bin.gz
│       ├── model.nnen.intgemm.alphas.bin.gz
│       └── vocab.nnen.spm.gz
└── base
    ├── bgen
    │   ├── lex.50.50.bgen.s2t.bin.gz
    │   ├── model.bgen.intgemm.alphas.bin.gz
    │   └── vocab.bgen.spm.gz
    <... Omitted for brevity ...> 
    └── zhen
        ├── lex.50.50.zhen.s2t.bin.gz
        ├── model.zhen.intgemm.alphas.bin.gz
        └── vocab.zhen.spm.gz

71 directories, 206 files

Flatten the directory structure

We need all the models, vocabs and shortlists in a single directory. We also need a registry.json associating models, vocabs and shortlists with language pairs.

Use this github gist to flatten the folder structure and also to generate the registry.json file. save that bash script into firefox-translations-models/process-firefox-translation-models.py

  cd firefox-translations-models
  chmod +x process-firefox-translation-models.py
  ./process-firefox-translation-models.py

This should have created a new directory called flattened-models with all the models, vocabs and shortlists and a file in this directory called registry.json along with a copy of the LICENSE file.

Verify your directory and registry.json

Go through the directory and registry.json and check if everything looks valid.

Create the archive

Archiving using these settings will take a long time. change the tar step if you wish to do a quicker compress.

  mv flattened-models/ firefox/ # rename the directory
  time tar -Jcvf firefox-models.tar.xz firefox/

The tar step took about 30 mins on my thinkpad x270. But, it will take less than 5 mins on a good computer.

You'll find the archive ready in firefox-models.tar.xz.