Saturday, February 25, 2017

Google Machine Translation

[Retrieved 25.2.2017]

Machine Translation

Machine Translation is a great example of how cutting edge research and world class infrastructure come together at Google. We focus our research efforts towards developing statistical translation techniques that improve with more data and generalize well to new languages. Our large scale computing infrastructure allows us to rapidly experiment with new models trained on web-scale data to significantly improve translation quality. This research backs the translations served at, allowing our users to translate text, web pages and even speech. Deployed within a wide range of Google services like GMail, Books, Android and web search, Google Translate is a high impact, research driven product that bridges the language barrier and makes it possible to explore the multilingual web in 90 languages. Exciting research challenges abound as we pursue human quality translation and develop machine translation systems for new languages.

43 Publications

Neural Machine Translation

[Retrieved from on 25.02.2017]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean
(Submitted on 26 Sep 2016 (v1), last revised 8 Oct 2016 (this version, v2))
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Cite as:arXiv:1609.08144 [cs.CL]
(or arXiv:1609.08144v2 [cs.CL] for this version)
Submission history
From: Mike Schuster [view email]
[v1] Mon, 26 Sep 2016 19:59:55 GMT (969kb,D)
[v2] Sat, 8 Oct 2016 19:10:41 GMT (968kb,D)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
Link back to: arXiv, form interface, contact.

Wednesday, February 22, 2017

The Classical Language Toolkit (CLTK)

Retrieved from the website [22.02.2017]

The Classical Language Toolkit (CLTK) offers natural language processing (NLP) support for the languages of Ancient, Classical, and Medieval Eurasia. Greek and Latin functionality are currently most complete.


  • compile analysis-friendly corpora;
  • collect and generate linguistic data;
  • act as a free and open platform for generating scientific research.

Academic Advisors

  • Neil Coffee, University at Buffalo (Associate Professor of Classics); Tesserae (Principal Investigator)
  • Gregory Crane, Universität Leipzig (Humboldt Chair of Digital Humanities), Tufts University (Professor of Classics); Perseus (Editor–in–Chief) and Open Philology (Director)
  • Peter Meineck, New York University (Associate Professor of Classics); Aquila Theatre (Founder), Ancient Greeks/Modern Lives (Founder, Director)
  • Leonard Muellner, Brandeis University (Professor Emeritus of Classical Studies); Center for Hellenic Studies (Director of Publications, Information Technology and Libraries)

Hunayn b. Ishaq - ܚܘܢܝܢ ܒܪ ܐܝܣܚܩ(808 - 873)
Identity from

James E. Walters et al., “Hunayn b. Ishaq — ܚܘܢܝܢ ܒܪ ܐܝܣܚܩ ” in A Guide to Syriac Authors, eds. David A. Michelson and Nathan P. Gibson, entry published August 17, 2016, The Syriac Reference Portal, ed. David A. Michelson.

"Physician, philosopher, theologian, and translator. His full name is Abū Zayd Ḥunayn b. Isḥāq b. Sulaymān b. Ayyūb al-ʿIbādī, and he was known in medieval Europe as Johannitius.”

From the website of the project [retrieved 22.2.2017] 


What is The Syriac Reference Portal is a digital project for the study of Syriac literature, culture, and history. Today, a number of heritage communities around the world have linguistic, religious or cultural identities with roots in Syriac language and culture. exists to document and preserve these Syriac cultural heritages. The online tools published by are intended for use by a wide audience including researchers and students, members of Syriac heritage communities and the interested general public. In order to meet the diverse needs of users, the design of is inherently collaborative and fluid.
The primary function of is to be a reference hub for digitally linking research findings.'s publications compile and classify core data for the study of Syriac sources, offer the scholarly community digital tools for freely disseminating that data, and facilitate further research through the creation of shared digital tools and infrastructure.

Saturday, February 4, 2017

Illicit Trade in Papyri: How It Works?

ِArrested in Alexandria: Report with images (see below) on 28/1/2015 from  Al Arabiya website here:

Three golden ushabti  !

A mummy !

More ushabti 

A bust
Coins also !

Illicit Trade in Papyri: How It Works?

I have read a lot about the illicit trade in papyri, but I have never explored it further. In this series of posts, I will gather information as much as I can from what is reported in the Egyptian (Arabic) newspapers.

I will not try to comment or translate any articles. I will just state the date of publishing the report(s) as well as the name of the journals. I will of course read every detail in the report. I hope in this way, I will, at the end, have a clearer picture of how these artefacts are transferred from Egypt to its final destination(s) either in Europe or in USA. 

The first report, I post here, appeared in in Alwatan (The home country) newspaper on 15/4/2016. It is reported that the Egyptian police has been able to arrest an antiquities dealer, who has stored 9000 pieces (sic !) in his house in the district of Ain Shams. Papyri and manuscripts are said to be found among these artefacts. The artefacts is said to come from Upper Egypt.

Here is the link to the report: