Show newer

Finally Julius seems to be more opinionated about the type of language models it uses, which should help to make it fast & reliable. Specifically it uses "word N-gram and context-dependent HMM" via "2-pass tree-trellis search".

4/4 Fin, for now. Next: Back to GCC's optimization passes, specifically computing valid integer ranges.

And this weekend I'll summarise all these threads into a blogpost for Rhapsode, amongst other site updates.

Show thread

@vfrmedia whenever i see an odd but specific requirement surface on some spec sheet or something, it makes me think someone, somewhere, had an embarrassing situation happen one time and it had to be codified into a regulation for everyone to save face...

Mozilla DeepSpeech appears to be little more than specially trained Tensorflow neuralnets ("deep learning", since apparantly neuralnets need a new name when they're too big). These tend to require much more training, but once you do they can give excellent results.

One drawback of neuralnets is that it's harder for Voice2JSON to guide it to follow a more constrained grammar. So it converts that model to a "scorer" which inform the neuralnet how well it did so it can improve next time.

3/4

Show thread

Kaldi (named for the Ethiopean goatherder who discovered the coffee plant) appears at a glance to works similarly to PocketSphinx, except it supports many, many more types of language models. Including severaltypes of neuralnets (custom implementation). Maybe even combining them?

I have heard that Kaldi's output's better, which makes sense. Interestingly both Kaldi & PocketSphinx provides GStreamer plugins, so how about exposing this automated subtitling @elementary ?

2/4?

Show thread

Underlying Voice2JSON, which I've been exploring the past few days & which underlies Rhapsode's newly-added user input, is a choice of four lower-level speech-to-text backends. Today I want to briefly describe how these differ, then blog over the weekend.

---

CMU PocketSphinx is probably the fastest & arguably simplest. I don't understand any details, but it measures various aspects of the input audio & matches them to a "language model" of one of a handful of types. Including wakewords.

1/3?

Knocking off the corners: path reverse should reverse the start and end on a connected line

@alcinnz I don't think this article actually supports the thesis in the title?

The way to stop these kinds of hacks is full source code chain of custody. The article talks about PUFs and digital signatures, and then says they *don't* fix the problem. But never really says what does.

@alcinnz Wow, such a great read!🙂

Andrew is just awesome. And he is such a good writer, too.

E.g. you might want to look into this _essay_ for one of his libraries called `nakala` and the plans he has for it:
github.com/BurntSushi/nakala/b 🤯

On reacting emotional to comments/issues/bug reports etc.: It is often very helpful to just wait a day or so _before replying_, because your emotions will have probably cooled down by then and you'll have much more clarity about the topic in question.

What is Left After Your Opensource Project is Done - @ Loup's: loup-vaillant.fr/articles/afte

I'm excited to be getting to this stage with Rhapsode!! And impressed this took less effort than wrapping WebKitGTK with important navigation aids...

I think I'd need help making sure I get this right, though I have made a decent start!

Pro tip: If your API requests take a long time just add a loading indicator with text that says "Making secure connection".

I learned that from every fintech app.

Polish university student Wojciech Kosior shares his story of how he managed to graduate without being forced to use Zoom, Skype, or other nonfree programs: u.fsf.org/3bv

Hello World! - Manuel Matuzović: matuzo.at/blog/hello-world/

Discussing building his own (non-table-laidout) personal site.

Other kernels: The scheduler
Linux: [dramatically] The Completely Fair Scheduler!

People should never be edge cases.

If your system fails because a person cannot hear sounds, your system is broken. If it fails because a person has no address, it’s broken. If it fails because their phone number comes from another country, it’s broken. If it fails because of all the reasons above, it’s still broken.

Edge cases happen, but human beings don’t deserve to find themselves in one.

Show older
FLOSS.social

For people who care about, support, or build Free, Libre, and Open Source Software (FLOSS).