Note: The instructions below are meant for packagers.
Prerequisites
- Make sure you have git-lfs installed
git lfs install
Clone the repository
git clone https://github.com/mozilla/firefox-translations-models
Note: This will take a long time as git-lfs needs to pull all the models (~1.5 GB) from remote.
Once the repository has been cloned, you'll find the models in firefox-translations-models/models
like so:
firefox-translations-models/models/ ├── tiny │ ├── bsen │ │ ├── lex.50.50.bsen.s2t.bin.gz │ │ ├── model.bsen.intgemm.alphas.bin.gz │ │ └── vocab.bsen.spm.gz | <... Omitted for brevity ...> │ └── nnen │ ├── lex.50.50.nnen.s2t.bin.gz │ ├── model.nnen.intgemm.alphas.bin.gz │ └── vocab.nnen.spm.gz └── base ├── bgen │ ├── lex.50.50.bgen.s2t.bin.gz │ ├── model.bgen.intgemm.alphas.bin.gz │ └── vocab.bgen.spm.gz <... Omitted for brevity ...> └── zhen ├── lex.50.50.zhen.s2t.bin.gz ├── model.zhen.intgemm.alphas.bin.gz └── vocab.zhen.spm.gz 71 directories, 206 files
Flatten the directory structure
We need all the models, vocabs and shortlists in a single directory.
We also need a registry.json
associating models, vocabs and shortlists with language pairs.
Use this github gist to flatten the folder structure and also to generate the registry.json
file.
save that bash script into firefox-translations-models/process-firefox-translation-models.py
cd firefox-translations-models
chmod +x process-firefox-translation-models.py
./process-firefox-translation-models.py
This should have created a new directory called flattened-models
with all the models, vocabs and shortlists and a file in this directory called registry.json
along with a copy of the LICENSE
file.
Verify your directory and registry.json
Go through the directory and registry.json
and check if everything looks valid.
Create the archive
Archiving using these settings will take a long time. change the tar step if you wish to do a quicker compress.
mv flattened-models/ firefox/ # rename the directory
time tar -Jcvf firefox-models.tar.xz firefox/
The tar step took about 30 mins on my thinkpad x270. But, it will take less than 5 mins on a good computer.
You'll find the archive ready in firefox-models.tar.xz
.