You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
P.J. Finlay 8cdc079fa6
Add tutorial link
3 days ago
argostrain Bug fix 3 months ago
bin Warn on checkpoints exist 1 week ago
docs Converting to Argos Train 7 months ago
scripts Opus conversion script 4 months ago
.gitignore Move source and target data to run/ dir 3 months ago
Dockerfile Bug fix 7 months ago
LICENSE Initial commit 2 years ago
MODEL_README.md Auto generate README.md file 7 months ago
README.md Add tutorial link 3 days ago
config.yml Reduced default train length to 10000 7 months ago
data-index.json Update fortytwo-it.com urls 3 days ago
requirements.txt Set CTranslate2 version 2 months ago
sample_data.py Improvements 1 year ago
setup.py Converting to Argos Train 7 months ago

README.md

Argos Train

Argos Translate | Tutorial | Video tutorial

Argos Train trains an OpenNMT PyTorch Transformer model and a SentencePiece tokenizer and packages them with Stanza data as an Argos Translate package. Argos Translate packages, which are zip archives with a .argosmodel extension, can be used with Argos Translate, LibreTranslate, and Dot Lexicon.

Pre-trained Argos Translate packages are available for download. If you have trained packages you're willing to share please get in contact so that they can be published on the Argos Translate package index.

Training example

$ su argosopentech
$ source ~/argos-train-init

...


$ argos-train
From code (ISO 639): en
To code (ISO 639): es
From name: English
To name: Spanish
Version: 1.0

...

Package saved to /home/argosopentech/argos-train/run/en_es.argosmodel

Data

Data from data-index.json is used for training. Argos Translate primarily uses data from the Opus project.

To train a model with custom data add your data to data-index.json after running argos-train-init with a link to download your custom data package. Data packages are zipped directories with a .argosdata extension that contain a source and target file with parallel data in corresponding lines and a metadata.json file.

Docker

Docker image available at argosopentech/argostrain.

docker run -it argosopentech/argostrain /bin/bash

Run training

argos-train

Environment

CUDA required, tested on vast.ai.

License

Licensed under either the MIT or CC0 License (same as Argos Translate).