heliport

A language identification tool that aims to be both fast and accurate. Originally started as a HeLI-OTS port to Rust.

Installation

From PyPi

Install it in your environment

pip install heliport

then download the model

heliport-download

From source

Install the requirements:

Python
PIP
Rust
OpenSSL

Clone the repo, build the package and compile the model

git clone https://github.com/ZJaume/heliport
cd heliport
pip install .
heliport-convert

Usage

CLI

Just run the heliport command that reads lines from stdin

cat sentences.txt | heliport

eng_latn
cat_latn
rus_cyrl
...

Python package

>>> from heliport import Identifier
>>> i = Identifier()
>>> i.identify("L'aigua clara")
'cat_latn'

Rust crate

use std::sync::Arc;
use heliport::identifier::Identifier;
use heliport::lang::Lang;
use heliport::load_models;

let (charmodel, wordmodel) = load_models("/dir/to/models")
let identifier = Identifier::new(
    Arc::new(charmodel),
    Arc::new(wordmodel),
    );
let lang, score = identifier.identify("L'aigua clara");
assert_eq!(lang, Lang::cat_Latn);

Benchmarks

Speed benchmarks with 100k random sentences from OpenLID, all the tools running single-threaded:

tool	time (s)
CLD2	1.12
HeLI-OTS	60.37
lingua all high preloaded	56.29
lingua all low preloaded	23.34
fasttext openlid193	8.44
heliport	2.33

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
LanguageModels		LanguageModels
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

heliport

Installation

From PyPi

From source

Usage

CLI

Python package

Rust crate

Benchmarks

About

Releases 7

Packages

Languages

License

ZJaume/heliport

Folders and files

Latest commit

History

Repository files navigation

heliport

Installation

From PyPi

From source

Usage

CLI

Python package

Rust crate

Benchmarks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages