Discover the New SDL Trados Studio 2017
Whether you're a beginner or an advanced user of CAT tools, join a webinar in December to learn about the leading software for translators -- SDL Trados Studio 2017.
Studio2017: Spotlight on upLIFT
1. Google's Neural Jump (Premium Content)
Surely there's not a single one among us who hasn't heard about the changes in how Google Translate works for a number of languages (EN <> FR, DE, ES, PT, ZH, JA, KO, TR -- and many more will certainly soon be announced). In these language combinations, the previous phrase-based statistical model has been replaced by a neural machine translation system that potentially offers better translations.
It might be surprising to hear, though, that while this system is in full force at the web-based translate.google.com, the API that many translation environment tools use to access Google Translate uses the old system.
Here is an example of how different the translations might look. The first is a translation suggestion from within a translation environment tool (so it's the "old" Google Translate), and the second stems from the new version via the web interface.
Even if you don't read German, you'll recognize that the translations are very different. If you do read German, you will agree that the neural MT suggestion is better overall -- especially because it was able to evaluate the context and therefore translate "tiles" not as bathroom tiles but as Windows software tiles. This is exactly NMT's claim to fame: the ability to take the larger context into consideration.
Now, depending on how you use Google Translate in your translation environment -- if you use it at all -- it might be a good thing to access the old system. As mentioned above, the old system is a "phrase-based" SMT system in which, as I've written many times before, the individual phrases might actually be very worthwhile to use, especially if your translation environment tool offers you a way to access them easily phrase by phrase.
Still, I was interested to see whether it's possible to bring the new system into our translation workflow, but so far Google has been anything but clear on that. You can apply for access -- which I did but haven't heard back (I'll let you know on Twitter if and when I do) -- but it's unclear to me whether it will actually be usable with the existing implementations in existing tools and how much it will cost (Google just says it will be more than $20 per 1 mill. characters -- the price for the old system). Again, I'll let you know once I find out more.
Oh, and if you wanted to read a little more on neural machine translation, I tried to explain it right here. Interestingly, the two examples I gave as samples for a likely successful translation in neural MT (the English sentence "The music world mourns the death of Prince" and the German sentence "Ich fahre den Fußgänger um" -- "I run over the pedestrian") are still incorrectly translated by the new Google Translate.
memoQ translator pro for only 6 euros/7 bucks?! -- Crazy Group Buy '16!
A Crazy Year - a Crazy Raffle: get an exceptionally great price on memoQ
+ participate in the raffle - every fifth buyer wins.
2. Morphing into the Promised Land
Some of you know that I've been very interested in morphology. No, let me put that differently: I've been very frustrated that the translation environment tools we use don't offer morphology. There are some exceptions -- such as SmartCat, Star Transit, Across, and OmegaT -- that offer some morphology support. But all of them are limited to a small number of languages, and any effort to expand these would require painful and manual coding.
Other tools, such as memoQ, have decided that they're better off with fuzzy recognition than specific morphological language rules, but that clearly is not the best possible answer either.
So, what is the problem? And what is morphology in translation environment tools about in the first place?
Well, wouldn't it be nice to have all inflected forms of any given word in your source text be automatically associated with the uninflected form that is located in your termbase or glossary and have that displayed in your terminology search results? And does it feel a little silly to even have to ask that question at a point when it should be a no-brainer to have any given tool provide that service? In case you wondered: The answer to both questions is "Yes, yes, resoundingly yes!"
On the other hand, there is a reason why we're stuck where we are. It happens to be cost. If you really have to manually enter morphology rules for all languages, it quickly becomes a Sisyphean exercise (starting with: "What exactly are all languages?"). If you do it just for the "important" (which in the eyes of the technology vendors means "profitable") languages, you end up with the situation we already have with the tools mentioned above.
A few years ago, a group of folks including myself had the idea to crowdsource the collection of morphology rules for and with each language-specific group of translators. Once the rules were collected, they could then be integrated into the various technologies. It sounded good, but it was hard to get the project started due to a lack of funds to build the necessary infrastructure and/or the time it would have taken to raise funds, among other issues.
Enter translation environment tool Lilt with a very cool proposal that may very well be the solution. Lilt's latest version introduces a "neural morphology" engine for all presently supported languages minus Chinese (so: EN, DA, NL, FR, DE, IT, NO, PO, PT, RU, ES, SV).
Here is the honest truth, though: When I first read the press release a couple of weeks ago, I fondly rolled my eyes and thought to myself that the folks from Lilt were just thinking it was wise to throw a little "neural" around while it's hot.
It turns out I was mistaken, however, as I found out when I talked with Lilt's John DeNero, who is the architect of this part of Lilt's system. John tried to explain to me what the system does and why it can make a big difference. It was not so hard to understand the second part, but my feeble untechnical mind had a hard time with the first part.
(By the way, we always assume that it's us, the less-technically-inclined, who are to be pitied when we don't understand technology. But can you imagine how pitiful life is for the more-technically-inclined who have to speak baby talk when communicating to us?)
This article rovides a good summary of the system, which essentially analyzes large monolingual corpora, detects morphological modifications (in theory, they could be any kind of modification; in practice, Lilt focusses on suffixes right now), and classifies them. Since any word is evaluated and also classified within a context, the system is able to distinguish between the adverbial ending -ly in English when it encounters "gladly" vs. "only." Using the same contextual analysis, the system is also able to make very educated guesses about the morphological transformation of unknown words. (For instance, it might never have encountered "loquacious," but chances are it would assume -- correctly -- that the adverbial transformation would be "loquaciously").
This works with every language (that uses morphology -- therefore excluding Chinese, for instance), provided there is enough corpus material to train the system. The time it takes for a new language to be trained is about 2.5 days (on very powerful computers). That's it.
Now, it's not perfect (whatever is??). John was very open in his assessment about where the system fails. It tends to fail with irregular morphology (it might not recognize "geese" as the plural of "goose" or "well" as the adverbial form of "good"), and there are about 5% of all cases where John felt that the engine should have made a correct judgment and it did not.
On the other hand, terminology hits have increased by a third for its users since Lilt introduced the system two weeks ago.
I consider this a quantum leap -- in particular because it will not only benefit the large European and Asian languages (where applicable) but the long tail-end of other languages as well. Well, you might say, Lilt covers only a handful of languages, so doesn't that end up being the same thing? The answer to that is (a two-fold) no. First of all, you can expect Lilt to continue to add languages, and -- even more importantly -- the module used to build these neural morphology engines is open-source and available for every translation technology developer right here.
This is what John said about the available engine and its usability:
"Here's our open-source release of the morphology system. It's released as an academic project and does not have any formal support, so it's not a product. If someone wanted to use it, they'd have to figure it out on their own (though of course I'm happy to answer questions)."
So, get on it, Kilgray and SDL and Atril and Wordfast and, and, and . . . .
It's also very promising that there are other areas where morphological knowledge can be used by a translation system: How about actively changing the inflection of a term that is automatically inserted based on its usage in the source? Or how about changing that inflection when repairing fuzzy matches? Or when repairing machine translation suggestions?
The sky's the limit with this. Be creative!
The Words You Want. Anywhere, Anytime
Let WordFinder open a new world of opportunities -- get access to millions of words and translations from the best dictionaries, on your computer, via a web browser, on your smartphone or tablet. Stuffed with lots of smart features. WordFinder has what you need as a translator in your everyday work -- anywhere, anytime!
Read more at www.wordfinder.com.
3. The Tech-Savvy Interpreter: The Rise of Interpreting Management Systems and Why You Should Care (Column by Barry Slaughter Olsen)
Talk to anyone responsible for staffing interpreting assignments and you'll discover quickly just how time consuming and inefficient the task can be. It is complex, with a lot of moving parts to coordinate. Language combinations, expertise, time, location, duration, subject matter, turnaround time, certifications, compliance, client preference, availability, type of assignment, interpreting equipment, and the list goes on...
In fact, market research conducted in one country recently revealed that agencies spend an average of 40 minutes to staff just one interpreted encounter, and that doesn't include all of the administrative work that comes after the interpreting assignment is complete to get an interpreter paid! Factor in that most of the growth in interpreting is coming in areas where interpreting assignments often last two hours or less and you can begin to understand why increasing the efficiency of all the administrative aspects around an interpreting assignment is so important.
As the demand for interpreting grows and the types of interpreted encounters continue to diversify, the process of matching interpreters to clients must become more efficient in order to meet demand while reducing administrative costs.
Interpreting Management Systems
Enter interpreting management systems or IMSes. The best core definition I have found for an IMS is from Hélène Pielmeier at Common Sense Advisory, who defines them as "applications designed to schedule and manage interpreting assignments, whether on site or remote." This definition gets at the heart of what an IMS does, but as you will see from the list provided in this month's column, many IMSes go far beyond the core definition to include delivery platforms, community building, and referral programs, to name just a few innovations. There will surely be more innovation to come as this space continues to evolve.
Two clear trends emerging in this IMS space are increased efficiency and convergence.
Increased efficiency is the new black. Lead times for staffing interpreted encounters are getting shorter and shorter. This means clients need to have open, fluid channels of communication with interpreters, and response times are critical. Expect to see more systems use instant messaging to communicate with interpreters. Interpreters should think carefully about how they are willing to interact, what communication technologies they are willing to monitor regularly, and how responsive they will be. Responsiveness, even when rejecting jobs, is becoming a key metric for project managers when deciding which interpreters to work with. In many cases, getting back to them tomorrow won't be good enough anymore.
All roads lead to convergence. While each of the various IMSes listed below is focused on a specific niche of the interpreting market (e.g., medical interpreting, conference interpreting, business interpreting, etc.), they all seek to integrate the various aspects of the staffing process into a single workflow. Some, like BoostLingo and TikkTalk, have actually built in their own video remote interpreting platform as well. They aim to be one-stop shops. Other platform innovations include GPS tracking to offer assignments to the interpreter closest to the job, smart matching using artificial intelligence to assign work based on interpreter availability and credentials, features to confirm interpreter check-in at assignments, client and interpreter evaluation, billing, invoicing, payment processing, compliance, report generation, and more. The competitive edge will go to agencies and interpreters that are able to adapt to and thrive in this new environment.
Examples of IMSes
The following is a list of different interpreting management systems currently on the market. I have loosely organized them into three categories. I encourage you to check them out. The interpreter matchmaking sites and one-stop shops are potential sources of work for freelance interpreters. The last category, IMSes for interpreting service providers, contains software platforms designed for ISPs or other entities that have large interpreting programs to coordinate, such as hospitals and courts.
Interpreter Matchmaking Sites
IMSes for Interpreting Service Providers
This list is not exhaustive but does give a sense of the growing interest in making the interpreting workflow more efficient, which will only improve access to this important service. Many interpreting agencies and large institutions are developing or have already developed custom IMSes for their own operations.
A Word of Caution to Developers
There is a definite trend toward consolidation and convergence, but I am skeptical of platforms that are seeking to be all things to all clients. Over the last ten years, I have seen several platforms take the any-language-anytime-anywhere approach. None, to my knowledge, has been successful. Most sputtered out under the weight of the promises made.
The IMSes that are showing promise are those that seek to differentiate their platform from the competition either by offering unique services that specific market's need (e.g., end-to-end service for the medical interpreting market) or carving out a niche for a certain type of interpreted encounter (e.g., focusing on business interpreting gigs and becoming adept at staffing last-minute requests).
Developers can look to the more mature translation management system (TMS) market, which has many different product offerings today, to see that there is ample room for competition and differentiation.
The Promise of Disintermediation for Interpreters
Freelance interpreters interested in developing more direct relationships with clients will find the interpreter matchmaking sites of greatest interest. These sites offer the convenience of an IMS (simple process for accepting or rejecting assignments, no invoicing necessary, direct deposit of payment, etc.) but allow the interpreter to negotiate fees directly with the end client and provide full transparency regarding rates and fees. These sites have built their business models around alleviating the administrative burden for the interpreter while still providing access to the end client.
Do you have a question about a specific technology? Or would you like to learn more about a specific interpreting platform, interpreter console, or supporting technology? Send us an email at email@example.com.
Talk Business Anywhere with Cadence
Cadence effortlessly earns you money. Average wage is $120/hr. Join our matchmaking platform that uses technology plus the human touch to connect and prepare you for the interpreting jobs you want. Over $1M paid out in the last 18 months. Sign up for free at www.talkbusinessanywhere.com
|4. TAUS . . . (Premium Content)
. . . or the "Translation Automation and User Society," may not have the best reputation among some of our colleagues. While I have had a few disagreements with some TAUS members' viewpoints, I have appreciated many of TAUS' endeavors. In fact, I was present at its first meeting (in Taos, New Mexico, in 2007) where much of the initial focus was set out. (That was also the meeting where I locked myself out of my room while completely naked in the hot tub on my terrace and had to proudly walk in all my very natural glory across the huge terrain of the conference resort to ask the receptionist to please, PLEASE! give me a key. I promised my wife I would never tell that story publicly. Sorry, honey!)
TAUS has gone through a number of metamorphoses. Today it has three arms:
- The "Quality Dashboard" offers a set of tools to apply the "Dynamic Quality Framework," a system of quality metrics for translation (see edition 236 of the Tool Box Journal in the archives).
- An "Academy" offers a large collection of reports and other materials as well as post-editor training. (This, by the way, is one of my disagreements with the folks from TAUS -- that MT work for translators is seen solely through the eyes of post-editing rather than more creative ways.)
- Finally, the "Data Cloud" is really where it all started for TAUS.
The Data Cloud was just spiffed up, and there are a couple of really interesting components from a translation professional's perspective, the most interesting (and badly underused) of which has to be the Data Cloud Search. This search is an interface that connects you to large repositories of TM data which you can search through in a large number of language combinations on a word or phrase level. You can filter the data by industry and client, and you'll get quick information about the usage percentages of one translation vs. another and so on and so forth. It's really very helpful, and I use it a lot when I work in one of the areas covered by the database.
Two of the downsides to the tool are that its data is (still) very IT-heavy and (by now) a little outdated. (Companies like Microsoft and others have not updated their data for a long time -- which in the case of Microsoft with its own slick search tool is understandable; for other companies it's a little frustrating.)
I asked Anna Samiotou from TAUS about these issues. Here is what she answered:
"As you know, most of the companies that had originally submitted data are from the IT industry and because of this, a large part of the Data Cloud contents is in the IT domain. The data they had contributed has been demanded a great deal and proven to be useful for training MT engines. They have not been submitting data recently. We are in contact with them to engage them to contribute new data, which in general they are willing to do, but such contribution may need some extra effort from their side.
"As we are interested in populating more industry domains in the Data Cloud, we are in contact with companies from other industries such as e-commerce and travel.
"Recent contributions this year include uploads from Alibaba (app. 270M source words in English-Chinese), Lingua Custodia (app. 14M source words in English-French, German-English, French-Italian, French-German, English-Spanish, Japanese-English, etc.), individual users/translators (app. 214M source words, English-Chinese), and others (app. 4B including public data). The previous year we had around 7B source words contributed in European and Asian languages (incl. public data)."
The takeaway? Be sure to continue to check it out and use it where appropriate. You can use it right from within the TAUS interface or from a tool such as IntelliWebSearch, which is my preferred method of access. (You can find a sample code to program a search in IWS, which you will have to adjust to your preference, right here.)
Beyond the search feature, there are a number of other things you can do in the Data Cloud, including uploading your data in exchange for other data. Here you can find an overview of free and paid services.
In addition, I'd recommend that you stop by the monthly Translation Technology Webinar (full disclosure: I'm often part of the panel) to look at new and not-so-new ideas in the technology space.
Across Translator Premium Edition: The Translation Software for Freelancers
Boost your productivity with an efficient, comprehensive work environment. Become a premium user and integrate your personal translation memory and terminology in the Offline Client.
For prices and a list of all premium features, visit www.crossmarket.net
Jean-François Richard from Quebecois translation technology developer and vendor Terminotix spent some time earlier this week on the phone with me to give me an idea about the newest version of AlignFactory, due to be released in the first quarter of 2017.
I wrote the following in my Tool Box ebook about the old version of AlignFactory:
AlignFactory offers an uncommonly high accuracy of alignment [=converting independent source and target files into a translation memory or corpus] because a) it uses a highly sophisticated alignment engine and b) it uses a number of filters that filter out any unlikely match (for instance, based on differing lengths of segments).
AlignFactory you can also select thousands of file pairs (including PDF files), have them matched up (they have to follow certain naming conventions such as a language identifier), and then have them aligned in one big swoosh. And it really is one big swoosh: the speed of the alignment is mind-boggling. In fact, it's so fast that I have repeatedly thought that something had gone wrong only to find that it had already successfully completed the alignment. While it's not perfect, it certainly has brought alignment to a different level.
AlignFactory really is a great tool, and I love doing demos of it during workshops -- the speed and accuracy never fail to impress an audience.
In the upcoming version, Jean-François and his team have integrated an interesting feature: a web crawler. A web crawler is a tool that can download complete websites onto your hard drive. Originally developed when it was expensive to spend a long time online, it allowed you to mirror websites on your computer instead and browse them without having to be online. While this isn't generally a need any more for most general users, translators have been real beneficiaries of that legacy technology. Tools like HTTrack, Teleport, and Quadsucker are all helpful tools for downloading complete translated websites or (more likely) just certain file types that are helpful for alignment purposes.
So it was just a logical next step for Terminotix to build this right onto the tool. Since they didn't feel that any of the existing solutions really matched their needs, they developed it from scratch. In fact, they'd already done that awhile back to help existing clients download their own websites and build a corpus that they could then query with Terminotix's LogiTerm (see edition 220 of the Tool Box Journal for a review of LogiTerm).
With the new web crawler, AlignFactory now downloads the relevant files from translated websites (excluding image, video, or audio files or files that contain only coding), automatically matches them up according to defined language pairs, and aligns them to be output as TMX (translation memory exchange) files or corpus files that can be used with Terminotix's other tools.
If the website is clearly structured (such as all French files in a directory called FR and identical HTML file names), AlignFactory will just use those markers to match the file pairs. If it's not quite that obvious, it will use a "heuristic" method where it looks for "fingerprints" within each file to learn its language and then associate it with the corresponding language.
The regular version of AlignFactory is not exactly cheap at CAN$1500, and the price will actually go up to CAN$2000 once the new version is released. The "Light" version is a lot cheaper but will not contain the new crawler tool, but . . . since the Terminotix team is eager to find out how its newly featured tool works in all kinds of situations, it'll give you full support alongside the free 45-day trial version. This should allow you to get to the data you've long pined for and make it useful. Just send an email to firstname.lastname@example.org and they'll give you access. (And you'll finally have something to do over those boring Christmas holidays...).
With MateCat machines and humans are stronger working together
Take care of the creative and highly specialized parts of translation.
MateCat uses Machine Learning to handle repetitive tasks automatically.
There are a couple of important follow-ups from my article about the new version of SDL Trados Studio 2017.
One is a correction. In the last Tool Box Journal I said this:
Of course, one thing you'll need to take into account when thinking about using this feature is that it requires an additional
paid subscription to the SDL Language Cloud
-- much like what
also requires with its cost to use its API (which in turns makes it possible to use that service within a translation environment tool).
The link is correct, but I apparently hadn't studied that web page adequately. Four hundred thousand characters per month are indeed free with SDL's new machine translation offering. From a single freelance translator's perspective, that essentially means you can use the solution for free.
My bad that I portrayed that differently.
The other thing I didn't mention had to do with the much-praised upLIFT solution. I had not realized (and hadn't even considered looking for) a feature reader Amy Bryant informed me of: unlike competing tools, SDL takes the fragment recall into consideration for project analysis -- which can then potentially be used in pricing projects with a new kind of fuzzy match rate.
I asked upLIFT's architect Kevin Flannagan about it, and this is some of what he said:
"I just checked which way this went, and yes, in
Studio 2017, leverage reports will include upLIFT statistics ... but it's important to be clear that they're totaled separately from existing statistics, and (as I understand it) SDL is not advising their use for any kind of discounting, and the LSP arm of SDL has been briefed not to use them in that way. You might think (as it struck me) that we should therefore leave those numbers off the reports. The reasoning for having them that interested me most goes along these lines: 'If we don't quantify fragment recall for translators -- notwithstanding that recalled fragments (like recalled fuzzy segment matches) won't always be useful -- how will a translator judge whether she/he will be able to complete the project in a day less, and therefore be able to take on more work and earn more?' Bearing in mind that a
Studio user can set fragment minima, e.g. minimum fragment length of 6 words, this kind of reporting does help identify the not-that-uncommon cases where good segment matches are few, but fragment recall really can make a big productivity difference (though we might need better reporting for that, e.g. identifying where a 20-word segment had no segment-level match but was entirely covered by two 10-word fragments).
"Does that mean that fragment recall statistics should never, and will never, be used for any kind of discounting? That's harder to say. On the one hand, (...) there are reasons to think technology is moving us towards an hourly-rate charging system anyway, in which case the question may become irrelevant."
I sort of get what he's saying. And who am I to say that a tool should not display all available data that could have an impact on the productivity of said tool? What I find notable is that other tool vendors that use fragment recall (AKA subsegment matching) could have done something similar but chose not to.
But then, it's up to the individual translator to not accept a rate change that takes subsegment use into consideration. And maybe (make that: hopefully) we'll already have switched to a time-based model at the point where it could become standard to use those numbers.
Oh, and there is another erratum from the last Tool Box Journal. I implied that memoQ's Muses (the databases that contain subsegments) are being dynamically updated. As reader Anthony Green pointed out, this is simply and sadly not true.
|7. New Password for the Tool Box Archive
As a subscriber to the Premium version of this journal you have access to an archive of Premium journals going back to 2007.
You can access the archive
. This month the user name is toolbox and the password is frighteningencounters.
New user names and passwords will be announced in future journals.
|The Last Word on the Tool Box Journal
If you would like to promote this journal by placing a link on your website, I will in turn mention your website in a future edition of the Tool Box Journal. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.
If you are subscribed to this journal with more than one email address, it would be great if you could unsubscribe redundant addresses through the links Constant Contact offers below.
Should you be interested in reprinting one of the articles in this journal for promotional purposes, please contact me for information about pricing.
© 2016 International Writers' Group