All right, kind of a lame title to this article, especially when it's about two, maybe even three rather exciting things at Kilgray memoQ. (The-company-formerly-known-as-Kilgray is now named after its flagship product. That might be a good thing, though I do remember seeing the first installment of their website with that screaming guy right next to the company name and thinking, whoa, these people are different!
And what exactly is SkyCAT? ;-) )
Anyway, the memoQ of 2018 has two really interesting new features.
One is the mobile app Hey memoQ, still in its pre-beta stage (you can sign up to be part of the beta group right here). One interesting aspect of Hey memoQ is that it's a mobile app; while the world of translation is kind of late in the game to adopt mobile apps that actually help you be more productive, it's been fun to see them coming out of the woodwork. The other intriguing aspect is what it does. It's a voice recognition tool that works in something like 86 languages and dialects. (I'm not going to list them here, but you can see them by following the link above.) The app allows you to dictate into your phone and have it transcribed on your PC, making it similar to what Tiago Neto has been working on by cobbling together a whole bunch of tools and resources, only now it's a bit more streamlined and tool-specific.
Let's step back a little, though.
As you can see from the link, memoQ is using the Nuance Recognizer (Nuance is the company behind Dragon, the premier but very limited voice recognition product as far as the number of supported languages). It accesses this through the Apple Speech Recognition SDK (SDK = software development kit), so yes, you've probably drawn the right conclusion, the app is available only for iOS at this point. Gergely Vándor from memoQ assumed it's likely that there will be an Android version at a later point if this proves to be a successful first implementation. The idea for the app is about a year and a half old and comes out of memoQ's "Innovations" department headed by Gábor Ugray, one of the company's founders.
The system is set up so the phone app on your iPhone or iPad talks to a proxy server, which in turn communicates with both memoQ on your computer and Nuance's speech recognition server (through the above-mentioned Apple SDK). There is also some data traffic going from your memoQ installation back to the speech recognition server by using a "hint" feature that sends segment-specific termbase data to Nuance to increase recognition accuracy (that way "I" does not become "eye" or "aye" in an English context). According to Gergely, this "hint" feature is a bit of a "black box" for memoQ, so it may or may not be useful and there likely will be an option to deactivate it.
This feature is streamlined elsewhere into memoQ with the inclusion of some voice commands (stuff like "next segment" or "select XYZ," etc.), which will also likely be extended in the future (plus, the upcoming beta phase should give the developers some clues about what kind of commands are commonly used and which are not).
Can I let my enthusiastic self out for a little bit?
I love this tool!
I haven't yet tried it out myself, but here's what I think is so cool about it: It's often been said by others (and myself) that voice recognition is kind of the underrepresented productivity booster for certain kinds of translators and certain kinds of translation. The strange and somewhat frustrating thing about voice recognition is that it really does not mesh well with other features provided by translation environment tools. AutoWrite and AutoSuggest wait for data that comes from single key hits, assemble assumes that it's sometimes quicker to rearrange than freshly translate, fragment-based machine translation typically uses processes similar to AutoWrite, and so on and so forth. But I'm excited that a translation environment developer who is right in the midst of all this should be able to find ways to mitigate some of those problems. Since it clearly does not make sense to forego one productivity feature to gain another, there needs to be a way to combine them. That's what I eventually hope to see from this.
And then there's the fact that dictation is suddenly open to so many more languages and that it's free (which is an advantage even for those who dictate in those few languages covered by Dragon).
One issue that probably still needs to be addressed in some way is privacy with the cloud-based voice recognition. Apple states on the website for its SDK:
"Do not perform speech recognition on private or sensitive information. Some speech is simply not appropriate for recognition. Avoid sending passwords, health or financial data, and other sensitive speech for recognition."
Okay, then. And Nuance -- for a different product -- says:
"By using Dragon Anywhere, you expressly consent and agree that speech data, which may contain personal information, shall be stored and processed in the United States. "Speech data" means the audio files, associated text, transcriptions and log files provided by you or generated in connection with Nuance products."
So it looks like voice recognition providers are not quite at the point machine translation providers arrived at earlier this year (and let's say this all together: "Thank you, GDPR!"), but that might be only a matter of time.
One thing that surprised me with Hey memoQ was that memoQ chose to use the Nuance products. Some of you will have read that earlier this year, Nuance discontinued their Swype keyboards for both Android and iOS, which had provided free access to voice recognition as well. I'm not completely sure that the reason was the powerful rise of Google's Gboard, but chances are it was. Gboard provides keyboard access as well as voice access to hundreds and hundreds of languages. It's not clear whether all the languages listed (click on "See supported languages" at the bottom) are voice-supported, but either way the list is much, much longer than the one from Nuance, and likely with a more secure future. Maybe the next version of Hey memoQ will (have to) make that switch.
The second (and third) feature that I like in memoQ is the video preview tool. memoQ is not the first with a tool like that (Star Transit has had this for quite a while and Wordbee's came out essentially simultaneously with memoQ's), but it's still important and kind of a no-brainer with the incredible rise in subtitle translation. The tool is based on the VLC Media Player -- which you likely are already familiar with because it's installed on your computer, anyway -- and supports essentially any video format that is also supported by that. The only caveat is that you don't just need the video but also a separate subtitle file (either an SRT file or an Excel file that contains the translatable text as well as the time stamps). The information in that subtitle file governs at what position the video is being shown as you translate and as you see your translated subtitles in the preview. You can also play longer segments with a number of subtitles if you choose that to get a better idea of the context in the video.
In addition, since memoQ developed this for the VLC Media Player, they had to open-source the code for the preview tool. By the time you receive this Tool Box Journal, the code should have been posted to github and might really be very useful. For instance, it could easily be used to build a preview/synchronization tool for video games, for software localization or other translation management systems, and so on and so forth.
And maybe, just maybe, this is the first step to a library of third-party apps that memoQ might offer at some point?
Of course, Gergely is right when he said that memoQ "should focus on core translation technology" and leave the -- albeit important -- rest to others.