The 290th Tool Box Journal - Premium Edition

A computer journal for translation professionals

Issue 18-8-290
(the two hundred ninetieth edition)

Contents

1. Can you hear me? Hey! Can you hear me? Hey memoQ! And how about that video?

2. SDL MultiTrans

3. This 'n' That

4. New Password for the Tool Box Archive

The Last Word on the Tool Box

@translationtalk

It's been so exhilarating to see how the @translationtalk Twitter account has flourished in just three short weeks of existence. The first three curators, Lloyd Bingham, Roberta Barroca, and Amanda Williams, devoted an outstandingly great amount of energy and creativity to demonstrate the diversity and pizazz of our community. I can't wait for the next few weeks and months. We are actually "booked" with curators through the end of this year, but please be sure to nominate more future curators (not yourself!) right here.

And of course, follow @translationtalk yourself. If you don't have a Twitter account, the last Tool Box Journal showed how to subscribe without actually having your own Twitter account. Or use this as an opportunity to get one. In fact, a really helpful way to get jump-started with a plethora of worthwhile people to follow (some of whom will follow you right back) is to look through the long list of people who follow @translationtalk. If there ever was a valuable list of interested and engaged translators and interpreters, this is it!

If you're still not convinced about the value of community as represented in this Twitter account, listen to your colleague Itzaris Weyman:

"This has been such an inspiring few weeks @translationtalk! Inspired to advocate for the standing that our profession deserves in the global arena, to educate clients and the public in general, to ensure that we share and mentor upcoming young T&I professionals! Bravo..."

1. Can you hear me? Hey! Can you hear me? Hey memoQ! And how about that video?

All right, kind of a lame title to this article, especially when it's about two, maybe even three rather exciting things at Kilgray memoQ. (The-company-formerly-known-as-Kilgray is now named after its flagship product. That might be a good thing, though I do remember seeing the first installment of their website with that screaming guy right next to the company name and thinking, whoa, these people are different!

memoq2005

And what exactly is SkyCAT? ;-) )

Anyway, the memoQ of 2018 has two really interesting new features.

One is the mobile app Hey memoQ, still in its pre-beta stage (you can sign up to be part of the beta group right here). One interesting aspect of Hey memoQ is that it's a mobile app; while the world of translation is kind of late in the game to adopt mobile apps that actually help you be more productive, it's been fun to see them coming out of the woodwork. The other intriguing aspect is what it does. It's a voice recognition tool that works in something like 86 languages and dialects. (I'm not going to list them here, but you can see them by following the link above.) The app allows you to dictate into your phone and have it transcribed on your PC, making it similar to what Tiago Neto has been working on by cobbling together a whole bunch of tools and resources, only now it's a bit more streamlined and tool-specific.

Let's step back a little, though.

As you can see from the link, memoQ is using the Nuance Recognizer (Nuance is the company behind Dragon, the premier but very limited voice recognition product as far as the number of supported languages). It accesses this through the Apple Speech Recognition SDK (SDK = software development kit), so yes, you've probably drawn the right conclusion, the app is available only for iOS at this point. Gergely Vándor from memoQ assumed it's likely that there will be an Android version at a later point if this proves to be a successful first implementation. The idea for the app is about a year and a half old and comes out of memoQ's "Innovations" department headed by Gábor Ugray, one of the company's founders.

The system is set up so the phone app on your iPhone or iPad talks to a proxy server, which in turn communicates with both memoQ on your computer and Nuance's speech recognition server (through the above-mentioned Apple SDK). There is also some data traffic going from your memoQ installation back to the speech recognition server by using a "hint" feature that sends segment-specific termbase data to Nuance to increase recognition accuracy (that way "I" does not become "eye" or "aye" in an English context). According to Gergely, this "hint" feature is a bit of a "black box" for memoQ, so it may or may not be useful and there likely will be an option to deactivate it.

This feature is streamlined elsewhere into memoQ with the inclusion of some voice commands (stuff like "next segment" or "select XYZ," etc.), which will also likely be extended in the future (plus, the upcoming beta phase should give the developers some clues about what kind of commands are commonly used and which are not).

Can I let my enthusiastic self out for a little bit?

I love this tool!

I haven't yet tried it out myself, but here's what I think is so cool about it: It's often been said by others (and myself) that voice recognition is kind of the underrepresented productivity booster for certain kinds of translators and certain kinds of translation. The strange and somewhat frustrating thing about voice recognition is that it really does not mesh well with other features provided by translation environment tools. AutoWrite and AutoSuggest wait for data that comes from single key hits, assemble assumes that it's sometimes quicker to rearrange than freshly translate, fragment-based machine translation typically uses processes similar to AutoWrite, and so on and so forth. But I'm excited that a translation environment developer who is right in the midst of all this should be able to find ways to mitigate some of those problems. Since it clearly does not make sense to forego one productivity feature to gain another, there needs to be a way to combine them. That's what I eventually hope to see from this.

And then there's the fact that dictation is suddenly open to so many more languages and that it's free (which is an advantage even for those who dictate in those few languages covered by Dragon).

One issue that probably still needs to be addressed in some way is privacy with the cloud-based voice recognition. Apple states on the website for its SDK:

"Do not perform speech recognition on private or sensitive information. Some speech is simply not appropriate for recognition. Avoid sending passwords, health or financial data, and other sensitive speech for recognition."

Okay, then. And Nuance -- for a different product -- says:

"By using Dragon Anywhere, you expressly consent and agree that speech data, which may contain personal information, shall be stored and processed in the United States. "Speech data" means the audio files, associated text, transcriptions and log files provided by you or generated in connection with Nuance products."

So it looks like voice recognition providers are not quite at the point machine translation providers arrived at earlier this year (and let's say this all together: "Thank you, GDPR!"), but that might be only a matter of time.

One thing that surprised me with Hey memoQ was that memoQ chose to use the Nuance products. Some of you will have read that earlier this year, Nuance discontinued their Swype keyboards for both Android and iOS, which had provided free access to voice recognition as well. I'm not completely sure that the reason was the powerful rise of Google's Gboard, but chances are it was. Gboard provides keyboard access as well as voice access to hundreds and hundreds of languages. It's not clear whether all the languages listed (click on "See supported languages" at the bottom) are voice-supported, but either way the list is much, much longer than the one from Nuance, and likely with a more secure future. Maybe the next version of Hey memoQ will (have to) make that switch.

The second (and third) feature that I like in memoQ is the video preview tool. memoQ is not the first with a tool like that (Star Transit has had this for quite a while and Wordbee's came out essentially simultaneously with memoQ's), but it's still important and kind of a no-brainer with the incredible rise in subtitle translation. The tool is based on the VLC Media Player -- which you likely are already familiar with because it's installed on your computer, anyway -- and supports essentially any video format that is also supported by that. The only caveat is that you don't just need the video but also a separate subtitle file (either an SRT file or an Excel file that contains the translatable text as well as the time stamps). The information in that subtitle file governs at what position the video is being shown as you translate and as you see your translated subtitles in the preview. You can also play longer segments with a number of subtitles if you choose that to get a better idea of the context in the video.

In addition, since memoQ developed this for the VLC Media Player, they had to open-source the code for the preview tool. By the time you receive this Tool Box Journal, the code should have been posted to github and might really be very useful. For instance, it could easily be used to build a preview/synchronization tool for video games, for software localization or other translation management systems, and so on and so forth.

And maybe, just maybe, this is the first step to a library of third-party apps that memoQ might offer at some point?

Of course, Gergely is right when he said that memoQ "should focus on core translation technology" and leave the -- albeit important -- rest to others.

ADVERTISEMENT

SDL Trados Studio 2019 has arrived

SDL Trados Studio, has evolved to bring you a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features. New features include:

On-demand tutorials, tips and tricks
Tell Me technology - Access features, commands and settings 4x quicker
New Project Wizard - Create projects with 28% less clicks
Mid-project updates - Updating a project is now 82% faster

Read our blog with Massimo Ghislandi, VP of Translation Productivity, for more information about the new features in Studio 2019.

2. SDL MultiTrans

Many of you will have read that SDL bought Donnelley Language Solutions a couple of weeks ago. Typically I'm not particularly interested in the ever-rotating wheel of mergers and acquisitions -- I leave that up to the likes of Slator, Common Sense Advisory, and now also Nimdzi -- but this was different because of the technology involved. In 2014, what was then RR Donnelley bought the Canadian company MultiCorpora, developer and distributer of MultiTrans, an early translation environment tool that morphed into a translation management system after buying and integrating into their tool infrastructure another Canadian company, Beetext, a translation workflow tool provider.

MultiTrans never really lived up to its promise as far as a reasonable tool for language service providers and freelance translators -- its main clientele were (and are) large government and NGO clients -- but not because it lacked the power or feature set to be more widely accepted. It was and is quite a powerful tool, but unfortunately it was never packaged in a very attractive manner and was never marketed to make much of a splash.

What exactly was attractive about MultiTrans? Here's what I write in my Translator's Tool Box ebook:

"MultiTrans does not completely fit into this category [of translation memory based tools]. In fact, it is not a "traditional translation memory" tool to start with, but a "bi-text" or "corpus" tool, or, according to the tool's latest terminology preference, a "TextBase translation memory" tool. Rather than matching on a sentence-by-sentence level, MultiTrans' corpora are full source and target texts with an approximate matching capacity that allows alignment to be done virtually on the fly. What also distinguishes corpora from traditional translation memories is the display of all the context of the original text.

"MultiTrans was originally designed to cater to the needs of the Canadian government, whose millions of pages translated from and to French and English made it unreasonable to go through a manual alignment process."

I was present at an ATA conference in 1999 when Gerry and his son Daniel Gervais first introduced MultiTrans to the public, and I remember being impressed but also bewildered by its complicated "feel" even at that point. Still the core technology is powerful, and so far only memoQ has introduced a parallel corpus structure next to their traditional translation memory. So who might be next? I really hope it will be SDL Trados Studio. Here's what I wrote to Massi Ghislandi, the Executive VP for SDL Trados Software Solutions:

"Here is a bit of (my unsolicited) advice. MultiTrans has always been marketed and executed very poorly, but at its core it's got some interesting technology. Maybe the workflow technology should be saved as a separate product, but I would especially suggest that your development team use the core technology (the corpus technology) and integrate that into Studio. memoQ has something similar though not nearly as sophisticated, but they also have not been good about communicating its value. If you find a way to integrate the MultiTrans tech and find a good communications strategy, you might have a winning strategy."

Maybe after all these years this clever technology will still find its way into the hands of the translators who deserved it in the first place.

Also, I didn't mention to Massi that it would be welcome if iLangL's CMS connectors that already exist for MultiTrans would find their way into the SDL universe.

Oh, and how wonderful and ominous that the Twitter handle of "Donnelley Language Solutions and MultiTrans TMS," @DLS_MultiTrans, had to be changed only by shifting one letter to @SDL_MultiTrans.

ADVERTISEMENT

Last chance to buy owned licenses! We are switching to the monthly pricing of our software soon. Become organized before it is too late.

Exclusive offer for Tool Box Journal readers from Advanced International Translations: https://www.translation3000.com/aitpn/557-55.html

10 copies of each product available at 50% off.

Focus on translation not administration. Get your copy now.

3. This 'n' That

Voice recognition and optical character recognition (OCR) are extremely language-specific technologies. So a tool like the Optical Character Recognition for Indian Languages (Tamil, Malayalam, Hindi, Kannada, Bangla, Assamese, and Punjabi) must feel like a summer rain if you work in those languages. Of course, the question is whether it works adequately. Reader Bob Myers ran into roadblocks with Kannada, but I was able to convert Bangla quite well, so it's certainly worth a try.

I've previously pointed to Richard Ishida's "Unicode code converter," a tool that looks puzzling at first but becomes very useful if you need to convert any amount of text from one code to another. Many times that can be achieved with text editors, but sometimes it can't -- and it's those times that make us worry. All you need to do here is paste text into the first text box, click Convert, and you'll have instantaneous conversions in all kinds of common and uncommon encodings of your text.

Here is a mere sampling of the different codes in the latest beta version:

Google released its predictive "Smart Compose" feature for Gmail and Google Docs that, according to the press, is based on "machine translation," which naturally is not completely true. It's based on corpora that machine translation are also based on. What's interesting is that Star (of Star Transit) has been offering a very similar product with Outlook for many years. And since Star's offering is solely based on the corpus built on the basis of your previous emails, it does not (or should not!) come up with the very naughty suggestions that Google has already been in the press for (and that I'd rather not repeat here).

Hank Sanderson of SanTrans wrote to me about my erroneous description of his product, which I had described as subsets of the DGT TM:

My prepackaged TMs are not based upon DGT TM, but on IATEs TBX Export.

Extra information: my packages (TMs and TBs) contain, apart from language pairs extracted from the complete database, also six subsets, each containing more or less related domains. The packages are updated about every year. Last version: April 2018.

Sorry . . ..

ADVERTISEMENT

Terminotix announces SynchroTerm 2018 which offers the following new features:

Cleaner and more modern interface
Duplicates detection feature at record creation
SDLXLIFF and XLIFF files support
64-bit compatible
Faster extraction engine

SynchroTerm is a terminology extraction software used by a number of international organizations, LSPs and freelancers alike to accelerate the process of creating term records by employing statistical algorithms to automatically identify equivalent terms.

If you have any questions about Terminotix products, please write to info@terminotix.com.

5. New Password for the Tool Box Archive

As a subscriber to the Premium version of this journal you have access to an archive of Premium journals going back to 2007.

You can access the archive right here. This month the user name is toolbox and the password is livingocean.

New user names and passwords will be announced in future journals.

The Last Word on the Tool Box Journal

If you would like to promote this journal by placing a link on your website, I will in turn mention your website in a future edition of the Tool Box Journal. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.

If you are subscribed to this journal with more than one email address, it would be great if you could unsubscribe redundant addresses through the links Constant Contact offers below.

Should you be interested in reprinting one of the articles in this journal for promotional purposes, please contact me for information about pricing.