Skip to content

A Natural Language Processing project for restoring Vietnamese sentence tone.

Notifications You must be signed in to change notification settings

HKAB/vietnamese-tone-restoration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vietnamese tone restoration 📰

A Natural Language Processing project for restoring Vietnamese sentence tone.

Requirements

torch, torchtext, pandas, numpy, tqdm, matplotlib, flask (in newest version)

Weight of the Transformers (BASE) model: Google drive link

Weight of the GRU Encoder Decoder model: Google drive link

Weight of the N-gram model: Google drive link

Put all the weight under ./models/weights

Dataset

Dataset 1: includes 100K training sentence, 1K testing sentence from this repo. These are title from news article. We choosed this dataset because they are short enough.

Dataset 2: includes 200K training sentence, 500 testing sentence from our supervisor.

Models

  1. Transformers
Model Hidden size Num heads Num layers Learning rate Epochs
Transformers (SMALL) 128 4 2 0.001 20
Transformers (BASE) 512 8 6 0.0001 30
  1. GRU Encoder Decoder
Model Hidden size Layers Bidirections Learning rate Epochs
GRU Encoder Decoder 128 1 False 0.001 20
  1. N-gram (baseline from viblo)
  • KneserNeyInterpolated
  1. Beam search
  • Beam seach size: 4

All deep learning model are really sensitive to learning rate (in general we found out the bigger the model, the smaller the learning rate should be)

Results

Accuracy: Mean accuracy of all sentence

Model Accuracy on Dataset 1 Accuracy on Dataset 2
Transformers (SMALL) 0.742 0.766
GRU encoder decoder 0.712 0.661
N-gram 0.722 0.813

We also train Transformers (BASE) model on 2M sentence from Dataset 1 and perform one shot prediction on Dataset 2

Model Accuracy on Dataset 1 Accuracy on Dataset 2
Transformers (BASE) 0.937 0.818 (One shot)

Web UI

We use Flask to build a web page for interactive experience. It can show the attention from deep learning model and the score of each result in n-gram solution. We can run it by:

export FLASK_APP=inference.py
flask run

Then open index.html

A picture of a soldier trying to kill tone remover criminal

Three weapon you can use in this war

Result

Authors

References

About

A Natural Language Processing project for restoring Vietnamese sentence tone.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published