faceDC07b.jpg Yuval Marton

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Email: ymarton @t ccls.columbia.edu

 

 

News

I will give a three-hour tutorial at NAACL-HLT 2012, Montreal, Canada, June 3, 2012: “On-Demand Distributional Paraphrasing”.

 

Publication Chair of the NAACL-HLT 2012 collocated First Joint Conference on Lexical and Computational Semantics (*SEM), June 7-8, Montreal, Canada.

 

I am co-organizing the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012), to take place TBD during the ACL workshop period July 12-14, 2012, in Jeju Island, Republic of Korea. See the CFP.

 

New data release: Columbia University Arabic (MSA) syntactic dependency data with functional morphological features (2011). See below.

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

I am a post-doctoral researcher at IBM T.J. Watson Research Center, Yorktown Heights, NY, working in the statistical machine translation group, focusing on translation to and from morphologically rich languages. My current work leverages on my previous experience when I was a post-doctoral research scientist at the Columbia University Center for Computational Learning Systems (CCLS), working with Nizar Habash and Owen Rambow on syntactic parsing, focusing on Arabic parsing for statistical machine translation (SMT), including subject detection, morphological features for parsing, and parser-integrated PP attachment.

 

Besides parsing and statistical machine translation, my research interests include lexical semantics -- corpus-based semantic similarity measures, and paraphrase generation. More specifically, I am interested in infusing SMT and semantic measures with linguistic knowledge – via incorporating soft syntactic constraints and / or soft semantic constraints into various corpus-based models.  I am also interested in applying rich morphological features and analysis to SMT, parsing, and paraphrasing.

 

My interests also span using and adapting machine learning (ML) methods for natural language processing (NLP) – and using linguistically informed learning bias and feature design to make such ML-for-NLP methods more effective.

 

I was a linguistics Ph.D. student at University of Maryland (UMD), with a focus on computational linguistics. I was a member of the CLIP Lab at UMIACS, and I also frequented the CNL Lab. My advisors were Philip Resnik and Amy Weinberg.  I defended my dissertation in September 2009. My dissertation, entitled “Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models”, explored using soft syntactic and semantic constraints in end-to-end state-of-the-art statistical machine translation systems. It also introduced a novel distributional paraphrase generation technique that can benefit from soft semantic constraints, and presented a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined.

 

Following my interests in neuro-biologically plausible cognitive and linguistic models, I took several fascinating neuroscience courses at the Neuroscience and Cognitive Science (NACS) Program, and received the NACS Certificate. My qualifying paper focused on visual word recognition. I argued there for a lexical representation that consists of both lower-level visual features and higher-level abstract letter objects, interacting with statistical factors (word frequency) and partly innate factors (left or right visual field perception).  During the second half of my studies, I did research in this area with Carol Whitney.

 

Further back in time, I was also involved in text classification research (authorship attribution and topic / genre classification).  My previous-previous advisor was Lisa Hellerstein, back when I was a computer science graduate student at the Polytechnic Institute of NYU (formerly Polytechnic University, Brooklyn, NY), where I received my Computer Science Masters.

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Publications

Journals

Yuval Marton, Nizar Habash and Owen Rambow. “Improving Arabic Dependency Parsing with Surface and Functional Morphology Features”. Computational Linguistics (Accepted, in revision, 2012)

We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language. Using controlled experiments, we find that definiteness, person, number, gender, and undiacritized lemma are most helpful for parsing on automatically tagged (predicted) input. We contrast it with using gold input. We further contrast the contribution of form-based and functional features, and show that functional features for gender and number (e.g., “broken plurals”) and the related rationality (“humanness”) feature improve over form-based features. We show that training on a combination of predicted and gold features improves over the alternatives. We examine the contribution of these features – some of which only recently introduced for Arabic NLP – in two transition-based parsers: MaltParser and Easy-First Parser.

Yuval Marton, David Chiang, and Philip Resnik. “Soft Syntactic Constraints for Arabic-English Hierarchical Phrase-Based Translation”. Machine Translation Journal Special Issues on Machine Translation for Arabic. Editor-in-Chief: Andy Way, Guest Co-Editors: Nizar Habash and Hany Hassan. Paginated version: Volume 26, Issue 1 (2012), pages 137-157. Online version: Journal no. 10590, 29 October 2011

In adding syntax to statistical machine translation, there is a tradeoff between taking advantage of linguistic analysis and allowing the model to exploit parallel training data with no linguistic analysis: translation quality versus coverage. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We argue that in order for these constraints to improve translation, they must be fine-grained: the constraints should vary by constituent type, and by the type of match or mismatch with the parse. We also use a different feature weight optimization technique, capable of handling large amount of features, thus eliminating the bottleneck of feature selection. We obtain substantial improvements in performance for translation from Arabic to English.

Marine Carpuat, Yuval Marton, and Nizar Habash. “Reordering Post-verbal Subjects for Arabic-to-English Statistical Machine Translation”. Machine Translation Journal Special Issues on Machine Translation for Arabic. Editor-in-Chief: Andy Way, Guest Co-Editors: Nizar Habash and Hany Hassan. Paginated version: Volume 26, Issue 1 (2012), pages 105-120. Online version: Journal no. 10590, 8 November 2011

We study challenges raised by the order of Arabic verbs and their subjects in Statistical Machine Translation (SMT). We show that the boundaries of post-verbal subjects (VS) are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. In addition, VS constructions have highly ambiguous reordering patterns when translated to English, and these patterns are very different for matrix (main clause) VS and non-matrix (subordinate clause) VS. Based on this analysis, we propose a novel method for leveraging VS information in SMT: we reorder VS constructions into SV order for word alignment. Unlike in previous approaches to source-side reordering, phrase extraction and decoding are performed using the original Arabic word order. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline. Limiting reordering to matrix VS yields further improvements.

Yuval Marton. “Distributional Phrasal Paraphrase Generation for Statistical Machine Translation”. ACM Transactions on Intelligent Systems and Technology (TIST) special issue on paraphrasing. Eds.: Haifeng Wang, Bill Dolan, Idan Szpektor, Shiqi Zhao. In publication

Paraphrase generation has been shown useful for various natural language processing tasks, including statistical machine translation. A commonly used method for paraphrase generation is pivoting [Callison-Burch et al. 2006], which benefits from linguistic knowledge implicit in the sentence alignment of parallel texts, but has limited applicability due to its reliance on parallel texts. Distributional paraphrasing [Marton et al. 2009] has wider applicability, is more language-independent, but doesn’t benefit from any linguistic knowledge. Nevertheless, we show that distributional paraphrasing can yield greater gains. We report method improvements leading to higher gains than previously published – almost 2 BLEU points, and provide implementation details, complexity analysis, and further insight into this method.

Conferences

Yuval Marton, Nizar Habash and Owen Rambow. “Improving Arabic Dependency Parsing with Lexical and Inflectional Surface and Functional Features”. The 49th Annual Meeting of the Association for Computational Linguistics (ACL), Portland, Oregon, USA, June 19-24, 2011. Full paper.

We explore the contribution of lexical and morphological features to dependency parsing of Arabic, a morphologically rich language. Using controlled experiments, we find that definiteness, person, number, gender, and un-diacritzed lemma are most helpful for parsing on automatically tagged input. We further contrast the contribution of surface and functional features, and show that functional features for gender and number (e.g., “broken plurals”) and the related rationality feature improve over surface-based features. It is the first time these functional features are used for Arabic NLP.

 

Hao Li, Xiang Li, Heng Ji, and Yuval Marton. “Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation”. The 24th Pacific Asia Conference on Language, Information and Computation (PACLIC), Sendai, Japan, November 4-7, 2010. Full paper.

Information Extraction (IE) is becoming increasingly useful, but it is a costly task to discover and annotate novel events, event arguments, and event types. We exploit both monolingual texts and bilingual sentence-aligned parallel texts to cluster event triggers and discover novel event types. We then generate event argument annotations semi-automatically, framed as a sentence ranking and semantic role labeling task. Experiments on three different corpora -- ACE, OntoNotes and a collection of scientific literature -- have demonstrated that our domain-independent methods can significantly speed up the entire event discovery and annotation process while maintaining high quality.

 

Yuval Marton. “Improved Statistical Machine Translation Using Monolingual Text and a Shallow Lexical Resource for Hybrid Phrasal Paraphrase Generation. The Ninth Conference of the Association for Machine Translation in the Americas (AMTA), Denver, Colorado, October 31 – November 5, 2010. Full paper.

Paraphrase generation is useful for various NLP tasks. But pivoting techniques for paraphrasing have limited applicability due to their reliance on parallel texts, although they benefit from linguistic knowledge implicit in the sentence alignment. Distributional paraphrasing has wider applicability, but doesn’t benefit from any linguistic knowledge. We combine a distributional semantic distance measure (based on a non-annotated corpus) with a shallow linguistic resource to create a hybrid semantic distance measure of words, which we extend to phrases. We embed this extended hybrid measure in a distributional paraphrasing technique, benefiting from both linguistic knowledge and independence from parallel texts. Evaluated in statistical machine translation tasks by augmenting translation models with paraphrase-based translation rules, we show our novel technique is superior to the non-augmented baseline and both the distributional and pivot paraphrasing techniques. We train models on both a full-size dataset as well as a simulated “low density” small dataset.

 

Marine Carpuat, Yuval Marton, and Nizar Habash. “Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment”. The 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, July 11–16, 2010. Short paper.

We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.

 

Marine Carpuat, Yuval Marton, Nizar Habash. “Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT”.  17th Conférence sur le Traitement Automatique des Langues Naturelles (TALN; Conference on Natural Language Processing), Montréal, Canada, July 19-22, 2010. Best paper award. Full paper.

We improve our recently proposed technique for integrating Arabic verb-subject constructions in SMT word alignment (Carpuat et al., 2010) by distinguishing between matrix (or main clause) and non-matrix Arabic verb-subject constructions. In gold translations, most matrix VS (main clause verb-subject) constructions are translated in inverted SV order, while non-matrix (subordinate clause) VS constructions are inverted in only half the cases. In addition, while detecting verbs and their subjects is a hard task, our syntactic parser detects VS constructions better in matrix than in non-matrix clauses. As a result, reordering only matrix VS for word alignment consistently improves translation quality over a phrase-based SMT baseline, and over reordering all VS constructions, in both medium- and large-scale settings. In fact, the improvements obtained by reordering matrix VS on the medium-scale setting remarkably represent 44% of the gain in BLEU and 51% of the gain in TER obtained with a word alignment training bitext that is 5 times larger

Yuval Marton, Chris Callison-Burch and Philip Resnik. “Improved Statistical Machine Translation Using Monolingually-derived Paraphrases”. Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, August 6-7, 2009. Full paper.

Untranslated words still constitute a major problem for Statistical Machine Translation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivoting through other languages alleviates this problem, especially for the so-called "low density" languages. But pivoting requires additional parallel texts. We address this problem by deriving paraphrases monolingually, using distributional semantic similarity measures, thus providing access to larger training resources, such as comparable and unrelated monolingual corpora. We present what is to our knowledge the first successful integration of a collocational approach to untranslated words with an end-to-end, state of the art SMT system demonstrating significant translation improvements in a low-resource setting.

Yuval Marton, Saif Mohammad and Philip Resnik. “Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source / Corpus Hybrid Models”. Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, August 6-7, 2009. Full paper.

We propose a corpus–thesaurus hybrid method that uses soft constraints to generate word-sense disambiguated distributional profiles (DPs) from coarser "concept DPs" (derived from a small Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Not relying on a large lexical resource makes this method suitable also for resource-poorer languages or specific domains. Although it uses a knowledge source, the method is not vocabulary-limited: if the target word is not in the thesaurus, the method falls back gracefully on the word’s co-occurrence information. Experiments on word-pairs ranking by semantic distance show the new hybrid method to be superior to others.

David Chiang, Yuval Marton and Philip Resnik. “Online Large-Margin Training of Syntactic and Structural Translation Features”. Conference on Empirical Methods in Natural Language Processing (EMNLP 2008). Waikiki, Honolulu, Hawaii, October 25-27, 2008. Full paper.

 

Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT.  We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase-based model: first, we simultaneously train a large number of Marton and Resnik’s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 BLEU on a subset of the NIST 2006 Arabic-English evaluation data.

 

Yuval Marton and Philip Resnik. “Soft Syntactic Constraints for Hierarchical Phrased-Based Translation”. The 46th Annual Meeting of the Association for Computational Linguistics (ACL). Columbus, Ohio, June 16-18, 2008.  Full paper.

In adding syntax to statistical MT, there is a tradeoff between taking advantage of linguistic analysis, versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a context-free translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We obtain substantial improvements in performance for translation from Chinese and Arabic to English.

 

Yuval Marton, Ning Wu, and Lisa Hellerstein. "On Compression-Based Text Classification". Proceedings of the 27th European Conference on Information Retrieval (ECIR), Spain, March 2005. Abstract. Full paper here or here. Click here for the errata note!

Compression-based text classification methods are easy to apply, requiring virtually no preprocessing of the data. Most such methods are character-based, and thus have the potential to automatically capture non-word features of a document, such as punctuation, word-stems, and features spanning more than one word.  However, compression-based classification methods have drawbacks (such as slow running time), and not all such methods are equally effective. We present the results of a number of experiments designed to evaluate the effectiveness and behavior of different compression-based text classification methods on English text. Among our experiments are some specifically designed to test whether the ability to capture non-word (including super-word) features causes character-based text compression methods to achieve more accurate classification.

 

Workshops

Yuval Marton, Ahmed El-Kholy and Nizar Habash. “Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation”. EMNLP Sixth Workshop on Statistical Machine Translation (WMT), Edinburgh, UK, July 30-31, 2011. PDF

Paraphrases are useful for statistical machine translation (SMT) and natural language processing tasks. Distributional paraphrase generation is independent of parallel texts and syntactic parses, and hence is suitable also for resource-poor languages, but tends to erroneously rank antonyms, trend-contrasting, and polarity-dissimilar candidates as good paraphrases. We present here a novel method for improving distributional paraphrasing by filtering out such candidates. We evaluate it in simulated low and mid-resourced SMT tasks, translating from English to two quite different languages. We show statistically significant gains in English-to-Chinese translation quality, up to 1 BLEU from non-filtered paraphrase-augmented models (1.6 BLEU from baseline). We also show that yielding gains in translation to Arabic, a morphologically rich language, is not straightforward.

Yuval Marton, Nizar Habash, and Owen Rambow. Improving Arabic Dependency Parsing with Inflectional and Lexical Morphological Features. Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL) at Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Los Angeles, USA, June 1–6, 2010.  PDF.

We explore the contribution of different lexical and inflectional morphological features to dependency parsing of Arabic, a morphologically rich language. We experiment with all leading POS tagsets for Arabic, and introduce a few new sets. We show that training the parser using a simple regular expressive extension of an impoverished POS tagset with high prediction accuracy does better than using a highly informative POS tagset with only medium prediction accuracy, although the latter performs best on gold input. Using controlled experiments, we find that definiteness (or determiner presence), the so-called phi-features (person, number, gender), and undiacritzed lemma are most helpful for Arabic parsing on predicted input, while case and state are most helpful on gold.

Chris Dyer, Hendra Setiawan, Yuval Marton, and Philip Resnik. “The University of Maryland Statistical Machine Translation System for the Third Workshop on Machine Translation”. EACL 2009 Fourth Workshop On Statistical Machine Translation, March 2009, Athens, Greece. PDF.

This paper describes the techniques we explored to improve the translation of news text in the German-English and Hungarian-English tracks of the WMT09 shared translation task. Beginning with a convention hierarchical phrase-based system, we found benefits for using word segmentation lattices as input, explicit generation of beginning and end of sentence markers, minimum Bayes risk decoding, and incorporation of a feature scoring the alignment of function words in the hypothesized translation. We also explored the use of monolingual paraphrases to improve coverage, as well as co-training to improve the quality of the segmentation lattices used, but these did not lead to improvements.

 

Thesis / Dissertation

Yuval Marton. “Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models”. Ph.D. Dissertation, Department of Linguistics, University of Maryland, October 2009.  Official format or paper-saving single-space format.

This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with linguistic knowledge sources that are based on manual text annotation or word grouping according to semantic commonalities. I gainfully apply fine-grained linguistic soft constraints – of syntactic or semantic nature – on statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems. The introduction of semantic soft constraints involves intrinsic evaluation on word-pair similarity ranking tasks, extension from words to phrases, application in a novel distributional paraphrase generation technique, and an introduction of a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined.

Fine granularity is key in the successful combination of these soft constraints, in many cases. I show how to softly constrain SMT models by adding fine-grained weighted features, each preferring translation of only a specific syntactic constituent. Previous attempts using coarse-grained features yielded negative results. I also show how to softly constrain corpus-based semantic models of words (“distributional profiles”) to effectively create word-sense-aware models, by using semantic word grouping information found in a manually compiled thesaurus. Previous attempts, using hard constraints and resulting in aggregated, coarse-grained models, yielded lower gains.

A novel paraphrase generation technique incorporating these soft semantic constraints is then also evaluated in a SMT system. This paraphrasing technique is based on the Distributional Hypothesis. The main advantage of this novel technique over current “pivoting” techniques for paraphrasing is the independence from parallel texts, which are a limited resource. The evaluation is done by augmenting translation models with paraphrase-based translation rules, where fine-grained scoring of paraphrase-based rules yields significantly higher gains.

The model augmentation includes a novel semantic reinforcement component:  In many cases there are alternative paths of generating a paraphrase-based translation rule. Each of these paths reinforces a dedicated score for the “goodness” of the new translation rule. This augmented score is then used as a soft constraint, in a weighted log-linear feature, letting the translation model learn how much to “trust” the paraphrase-based translation rules.

The work reported here is the first to use distributional semantic similarity measures to improve performance of an end-to-end phrase-based SMT system. The unified framework for statistical NLP models with soft linguistic constraints enables, in principle, the combination of both semantic and syntactic constraints – and potentially other constraints, too – in a single SMT model.

Yuval Marton. “Character-Based and Word-Based Classification: Experiments with Compression Methods and a Word-Based Language Modeling Method”. Master’s thesis, NYU/Poly, CIS Department, 2004.

Text classification is the task of taking a set of input documents that are labeled by category, and using that input information to classify other, unlabeled documents. There are many approaches to text classification. A somewhat non-standard approach is to use compression. […]

 

Manuscripts

Yuval Marton. “What Can we Learn about Language Processing and Representation from Word Contour Effects on Letter Order Perception and Word Recognition in Right and Left Visual Fields?” Qualifying paper (Ling895), Department of Linguistics, University of Maryland, May 2007. Manuscript. When we read a word, we typically read it all at once, not letter by letter.  This assumption can be verified in laboratory conditions, when each word is displayed for less than 150ms, insuring that subjects have no time to saccade or otherwise move their eyes (following Rayner’s findings [Rynr86]).  Given that all letters in a word are perceived at the same time, readers need to determine and encode letter order, in order to correctly identify the word.  Whitney [Wtny04b] has argued that letter encoding, and in particular letter order encoding, is done using abstract representations. We will show that word recognition is not done solely with abstract letter symbols; low-level visual properties of the written word – specifically its contour (operationalized here as the existence and location of ascending and descending letters) – are also used for this task. We argue that if both contour information and abstract letter symbols are used for word recognition, they do not combine in a simple additive manner. We also argue that vision provides a unique contribution to language – parallel input processing, beyond mere equivalence to (serial) sound sequences. We will show differences and similarities of performance in left and right visual fields (LVF and RVF) in Hebrew, in different word contour and word frequency conditions, and contrast predictions of three theories of the well-known RVF advantage: innate left hemispheric advantage for language processing (e.g., [YE85]), acquired retinal/cortical RVF expertise (e.g., [Nzr00]), and  computational neural network assumptions ([LW05]).

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

NLP Resources and Tools

 

Columbia University Arabic (MSA) syntactic dependency models with form-based and functional morphological features (to be released with my CL article, 2012)

Please email me (or better: email Owen Rambow, Nizar Habash and me) to request permission.

 

Columbia University Arabic (MSA) syntactic dependency data with form-based and functional morphological features (2011)

With a training / testing split.

Previous versions were used in my NAACL SPMRL (2010) and ACL (2011) publications.

In order to use the data, you need to have BOTH of the following:

1. License from LDC to use the Penn Arabic Treebank part 3 (v3.1)

2. License or written permission from CCLS / Columbia University to use the Arabic functional features (functional gender, functional number and rationality); once you have obtained the LDC license, please email me (or better: email Owen Rambow, Nizar Habash and me) to request the CCLS / Columbia University permission.

 

Columbia University Arabic (MSA) syntactic dependency data for GALE (2009-2010)

Requires a GALE license; available to GALE participants.

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Honors and Awards

 

Best paper award, 17th Conference on Natural Language Processing (TALN), 2010

 

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Teaching

 

TA for Computational Linguistics II (Ling647 / CMSC828R), taught by Philip Resnik, Spring 2006

 

TA for Introductory Linguistics (Ling200), taught by Tonia Bleam, Spring 2008

 

TA for Introduction to Linguistics, taught by Tanya Reinhart, Tel Aviv University (during my undergraduate senior year)

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Tutorial, Workshop, and Conference Organization

 

General Co-chair (organizing committee member) of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012), to take place TBD during the ACL workshop period (July 12-14, 2012), in Jeju Island, Republic of Korea

 

Publication Chair of the NAACL-HLT 2012 collocated First Joint Conference on Lexical and Computational Semantics (*SEM), June 7-8, Montreal, Canada

 

Tutorial session: “On-Demand Distributional Paraphrasing”, at NAACL-HLT 2012, Montreal, Canada, June 3, 2012 (Accepted)

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Additional Academic Activities

 

I took part (or still am) in the following:

 

Review service:

- Machine Translation (MT) Journal

- The Association for Computing Machinery: Transactions on Asian Language Information Processing (ACM TALIP) Journal

- The Association for Computing Machinery Transactions on Intelligent Systems and Technology (ACM TIST) Journal

- The Institute of Electrical and Electronics Engineers’ Transactions on Knowledge and Data Engineering (IEEE’s TKDE) Journal

- Annual meetings of the Association for Computational Linguistics (ACL)

- Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

- Conference on Empirical Methods in Natural Language Processing (EMNLP)

- International Conference on Computational Linguistics (CoLing)

- The international conference on Language Resources and Evaluation (LREC)

- Workshop on Statistical Machine Translation (WMT)

- The Association for Machine Translation in the Americas (AMTA) Student Research Workshop

 

Colloquium Committee, member, Fall 2005 – Spring 2006.

 

Department of Linguistics’ Semantics Search Committee, member, Fall 2005 – Spring 2006.

 

NACS Program (The Program in Neuroscience and Cognitive Science at the University of Maryland):  I received the NACS Program Certificate in August 2008.

 

Psycholinguistic experiments at the CNL Lab, 2005 – 2008.

 

LSA Institute, MIT, Summer 2005.

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

Other Activities

 

Human (me!) translation: Example 1, Example 2.

 

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom

 

 

The Future

 

Under construction! (Permanently)

 

 

 

Top | Publications | Resources and Tools | Honors and Awards | Teaching | Organization | Additional Academic Activities | Other Activities | Bottom