Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No description provided.
The text was updated successfully, but these errors were encountered:
this is not very stable, it is more likely to speak Cantonese
感谢回复。 是的,这是我比较疑惑的地方。粤语的bpe与普通话看起来并没有区别,但他们的发音很明显是不同的,在您训练的数据中,是在粤语的文本前面添加 | < yue > |进行训练吗? 如果我想要新增一个同样使用汉字的方言,比如四川话,是不是应该把训练时的文本前增加 | < chuan > | 这种类似的标记,然后修改whisper /tokenizer.py 中的LANGUAGES 字典?
No description provided.
The text was updated successfully, but these errors were encountered: