arxiv: v1 [cs.cl] 25 Oct 2017

Size: px
Start display at page:

Download "arxiv: v1 [cs.cl] 25 Oct 2017"

Transcription

1 Linking Tweets with Monolingual and Cross-Lingual News using Transformed Word Embeddings Aditya Mogadala 1, Dominik Jung 2 and Ahim Rettinger 1 arxiv: v1 [s.cl] 25 Ot Institute AIFB, Karlsruhe Institute of Tehnology, Germany, aditya.mogadala@kit.edu, rettinger@kit.edu 2 Institute IISM, Karlsruhe Institute of Tehnology, Germany, dominik.jung2@kit.edu Abstrat. Soial media platforms have grown into an important medium to spread information about an event published by the traditional media, suh as news artiles. Grouping suh diverse soures of information that disuss the same topi in varied perspetives provide new insights. But the gap in word usage between informal soial media ontent suh as tweets and diligently written ontent (e.g. news artiles) make suh assembling diffiult. In this paper, we propose a transformation framework to bridge the word usage gap between tweets and online news artiles aross languages by leveraging their word embeddings. Using our framework, word embeddings extrated from tweets and news artiles are aligned loser to eah other aross languages, thus failitating the identifiation of similarity between news artiles and tweets. Experimental results show a notable improvement over baselines for monolingual tweets and news artiles omparison, while new findings are reported for ross-lingual omparison. 1 Introdution On the web, growth of soial media platforms has offered numerous opportunities with several hallenges to solve. Twitter 3 is one suh soial media platform that allows its users to share 140 haraters of text messages (popularly known as tweets) in multiple languages with their friends or followers. Tweets may ontain personal information or a onfined desription about an event motivated by the traditional media suh as online news artiles. Studies [1] have shown that 85% of the tweets are news affiliated. Though only some tweets aknowledge news artiles by expliitly linking them, most of them do not. This impliit linking of tweets with the news topis provide novel insights. For example, most of the traditional media ompanies that publish online news write only fats about an event. However, identifying relevant tweets for the orresponding news will append people opinion. Furthermore, attahing tweets with the news artiles will allow to understand the multi-dimensional view about ontroversial topis, thus 3

2 empowering the editor of an artile to modify upoming or following artiles based on veraity. Howbeit, due to the differenes in word usage aross informal tweets and the attentively drafted writings like news artiles make this linking hallenging. Nevertheless, different approahes are pursued to solve the problem. Initially, monolingual omparison of tweets with news artiles is ahieved by omprehending ommonality between the topis using unsupervised topi models [2]. Although a salable approah, it fails to apture importane of words and their differenes aross orpora. A graph-based latent variable model [3] was further introdued for finding short text orrelations using miroblog hashtags and news artiles named entities. Even though it addresses earlier drawbaks by giving importane to keywords suh as named entities in news artiles. It still ignores other large hunk of voabulary. Krestel et al. [4] followed a different path by posing the omparison of tweets with news as relevane assessment problem and designed supervised binary lassifier with many hand-rafted features. Yet supervised, the hand-rafted features limit its salability. Further, aforementioned approahes ignore the multilingual aspet of the published news. Nowadays most of the online news about any event is multilingual. Identifiation of a news artile belonging to single language is not enough to over the olletive views about an event. In this paper, we overome the limitations of earlier approahes and propose a new salable framework to support tweets with monolingual and ross-lingual news artile omparison. Our framework leverages monolingual [5] and bilingual [6] word embeddings aquired from tweets and news artiles as basi units for bridging the word usage gap aross these olletions. Furthermore, non-linear transformation of tweet word embeddings is performed to make it loser to the news artile word embeddings using manifold alignment with Prorustes analysis [7]. Work losely related to our approah is by Tan et al. [8] who perform lexial omparison of words observed in tweets and Wikipedia belonging to same language with only linear transformation, while we perform non-linear transformation and also aross languages. Three main ontributions are summarized as follows: 1. Proposed an approah to lassify tweets as to how relevant they are for a given news artile in more than one language. 2. New evaluation orpora is reated for monolingual and ross-lingual tweets to news artile omparison. 3. Lexial and task speifi evaluation results are presented on two different datasets. 2 Related Work Most of our researh is losely related to the work that identifies relevane of tweets with online news or perform event detetion. We divide eah of the related works into separate ategories.

3 2.1 Event Detetion in Tweets Analyzing information flow about the events as they emerge is an important aspet of event detetion in tweets. Several works used this information in various ways. Some approahes [9] olloated emerging events and lassified them into different ategories, while some [10] found sentiment from the deteted events. Others deteted events as trends to trak publi health [11], politial abuse [12] and risis ommuniation [13]. 2.2 News and Relevant Tweets Several approahes have been explored to identify relevant noisy tweets with the lengthy news artiles. Initially, a semanti enrihment framework [14] was built to link news artiles and tweets by identifying possible orrelations to provide personalized news reommendations. Jin et al. [15] viewed the problem from different perspetive and introdued a dual latent Dirihlet alloation model to jointly learn two sets of topis. Later, a more sophistiated unsupervised topi modeling [2] approah was proposed for finding overlap of topi distribution between tweets and news artiles obtained from New York Times Distributed Representations Distributed word representations [5] has shown signifiant improvements in many NLP tasks [16]. Different variations of them suh as bilingual [6] and polylingual [17] are also obtained by projeting multiple or pair of languages into the shared semanti spae. Also, word representations were extended to meet requirements of the short or noisy text [18,19]. 3 Monolingual Word Usage Charateristis To understand the harateristis of word usage, initially news artiles in German and English are olleted between January, 2015 and Deember, To have a good overlap of topis, keywords 5 are extrated from news artiles to be used as queries for olleting tweets belonging to the same period with Twitter searh API 6. Aquired tweets are then polished by removing URLs, user mentions, # symbol of the hashtags, and all re-tweets. Additionally, Glove 7 is used to obtain word embeddings with 400 dimensions for both olletions. Size of the final doument sets and the voabulary extrated from Glove is listed in the Table 1. Word embeddings for eah olletion are now used to effetively omprehend the word usage harateristis. Initially, top 10 ommon and frequent words observed in both olletions are visualized with t-sne [20]. We observed that the

4 Colletion Language Douments Voabulary News English News German Tweets English Tweets German Table 1. Colletion Sizes same words learned separately from tweets and news olletion are highly separated. Furthermore to apprehend the differene in slangs, abbreviations et., in both olletions, we use frequent 5000 ommon voabulary terms (both English and German) to pereive differenes among their nearest neighbors. Based on rank biased overlap (RBO) measure [21,8] whih provides a omparison between inomplete and indefinite rankings, we observe a minimal average RBO measure of and for English and German respetively with parameters ϕ = 0.9 and k = 100. Thus exhibiting the differene in word usage among both olletions. This motivates us to transform word embeddings learned with tweets loser to word embeddings learned using news artiles or vie versa. 4 Transformed Word Embeddings (TWE) Differene in the embeddings learned from two different olletions suh as tweets and news require bridging with embedding transformation. In this setion, we formulate the problem and present our approah for monolingual and rosslingual transformation. 4.1 Problem Formulation Let, Tw l n = {t l w 1, t l w 2...t l w i...t l w n } and Te l n = {t l e 1, t l e 2...t l e i...t l e n } represent set of words and their orresponding embeddings extrated from tweet olletion respetively. Where l is the language of tweets, n is the size of voabulary and eah embedding is of dimension t l e i R 1Xd. Similarly, Nw l m = {n l w 1, n l w 2...n l w i...n wm } and Ne l m = {n l e 1, n l e 2...n l e i...n l e m } represent set of words and their embeddings of news orpora respetively. Where l is the language of news orpora, m is the size of voabulary and eah embedding is of dimension n ei R 1Xd. Formally, now our researh question is to identify ommon words {Tw l, Nw l } = {t l w i, n l w i } i=1 and transform word embeddings in the tweet olletion (T e l ) loser to the embeddings of news olletions (Ne l ) or vie versa. This transformation is based on the assumption that there prevails a transformation relationship between the vetors for the frequent words of eah olletion. Some approahes [8] have earlier performed this simple transformation only if the language of tweets and formal language orpora (e.g. news, Wikipedia) belong to same language. But, it is non-trivial if the language of tweets and formal language orpora differs. In the following setions, we present the transformation of tweet embeddings loser to the monolingual or ross-lingual news embeddings.

5 4.2 Monolingual-TWE Earlier approahes [8] assume only linear relationship between embeddings from different olletions to perform transformation. Sometimes relationship needs to handle disturbanes suh as saling and rotation. To ater suh issues, we leverage manifold alignment using Prorustes analysis [7] to transform word embeddings of tweets loser to word embeddings of news artiles with a three step proedure. Learning low-dimensional embeddings is ue for transformation. We already have low-dimensional embeddings {Te l, Ne l } of words observed in both tweet and news olletion. To find the optimal values of transformation, Prorustes superimposition is done by translating, rotating and saling the objets (i.e. rows of Te l is transformed to make it similar to the rows of Ne l ). Transformation is ahieved by Translation: Taking mean of all the members of set to make entroids T l e i N l e i ( i=1, i=1 ) lie at origin. Saling and Rotation: The rotation and saling that maximizes the alignment is given by orthogonal matrix (Q) and saling fator (j). They are obtained by minimizing orthogonal Prorustes problem [22] and is provided by Equation 1. arg min j,q N l e T l e F (1) where Te l a matrix of transformed T l e values given by jte l Q and. F is the Frobenius norm onstrained over Q T Q = I. If Tw l represents the words of T l e low-dimensional embeddings, then the final sets {Tw l, N l w } ontains loser orrespondene. To understand the effetiveness of this transformation, we perform similar experiments as of 3 in Cross-Lingual-TWE Comparison of voabulary obtained from tweets in one language (l 1 ) with the voabulary of news artiles in another language (l 2 ) is not straightforward. To subdue this onern, we propose a two step approah. In the first step, news artiles from two different languages are aquired to learn bilingual word distributed representations(i.e. bilingual embeddings). Aim of bilingual embeddings is to apture linguisti regularities aross languages into a ommon semanti spae suh that English and German words (e.g. wonderful and wunderbar ) are neighbors in the t-sne visualization, thus bridging the language gap.

6 In the seond step, ross-lingual transformation is ahieved between word embeddings obtained from tweets in l 1 and word embeddings of news artiles in l 2. As bilingual word embeddings of news artiles in l 1 also share linguisti regularities from l 2, mapping word embeddings of tweets loser to the bilingual word embeddings of news artiles of l 1 will also help to inorporate linguisti regularities of l 2. Consequently, transformation is attained in the similar way as 4.2 between word embeddings of tweets and bilingual word embeddings of news artiles belonging to same language. Step-1 To learn bilingual embeddings, we leverage the approah of Gouws et al. [6] as it is fast and salable to jointly optimize the monolingual objetive M( ) with the ross-lingual objetive ϕ( ) (i.e. ross-lingual regularization term) to find the overall loss L( ). Douments in the news olletion of languages l 1 and l 2 are used to learn monolingual models along with ross-lingual regularization term learned with parallel orpora (e.g. Europarl-v7). Overall loss funtion L( ) is given by Equation 2. L( ) = min θ l 1,θ l 2 lɛ{l 1,l 2} M l (w t, h; θ l, ) ) + λϕ(θl1 θl2 2 C l (2) ϕ(.) eliminates the need for word-alignment and makes an assumption that eah word observed in the doument of language l 1 an potentially find its alignment in the doument of language l 2. Thus, the Equation 2 is now modified into Equation 3. L( ) = min θ l 1,θ l 2 + λ 1 m m V l1 i w iɛl 1 lɛ{l 1,l 2} 1 n 2 M l (w t, h; θ l ) C l n V l2 i 2 w iɛl 2 Where V l1 and V l2 are monolingual word vetors of the words in douments of languages l 1 and l 2 respetively and C l is monolingual orpus (e.g. News). w t is the predited word in the ontext h of a monolingual model. (3) Step-2 We follow a similar proedure as of 4.2 but with a different set of embeddings. Low-dimensional embeddings that are used initially are {Te l1, Ne l1 } of words observed in both tweet and news olletion belonging to the same language. Here, Ne l1 represents bilingual embeddings. Transformation is now ahieved by translating, rotating and saling the objets (i.e. rows of Te l1 is transformed to make it similar to the rows of Ne l1 ) using the same proedure as desribed in 4.2.

7 5 Experimental Setup To evaluate our approah, we built a dataset for the ross-language and monolingual pairwise tweet and news artile relevane assessment. Also, we used the existing monolingual omparisons orpora to ompare with other approahes. 5.1 Corpus Creation Unavailability of datasets for omparing news artiles with the tweets in different languages ompelled us to reate our own. We reated a gold standard dataset for monolingual and ross-lingual omparison aross olletions by aquiring some more tweets and news artiles mainly in English and German in the same way as desribed in 3. Tweets with a single URL link to any news artile are olleted and arefully evaluated to see if it does not simply represent the news title or summary. If they only represent news title or summary then they are onsidered to be trivial and are removed. After basi preproessing, using the keyword Grexit (the Greee exit of the European Union) around 18 tweets and 18 news artiles (both English and German) are seleted for further human evaluation. 5.2 Human Evaluation The goal of the human evaluation is to get pairwise omparison sores between tweets and news. Thus, eah partiipant had to rate a pair of douments with respet to their semanti similarity. Three different annotators who have English(E) and German(G) language skills were hosen for omparing pair of tweets and news based on sores listed in Table 2. At the end, a list of 628 relevane Sore Type Desription 0 Dissimilar Tweet and news artile are not about same topi. 1 Related Tweet and news artile share topi but important ideas in news is not represented in the tweet. 2 Similar Tweet and news artile are about same topi and important ideas in news is represented in the tweet Table 2. Similarity Sores judgments (i.e. 162 between (E)Tweets and (E)News, 162 between (E)Tweets and (G)News and so on) were produed. A signifiane test with Kendall s τ is omputed to test the onsisteny among user judgments. Results suggested that there is no signifiant differene in the sore pairs of users (0.05 signifiane

8 level). Speifially, the results showed that users have an similar understanding of the similarity assessment. To obtain the final sore for eah pair, similar to SemEval semanti similarity tasks 8 arithmeti mean was alulated between all user ratings. We term this resoure as Dataset-1 9. This dataset provides more fine-grain omparison as ompared to other datasets [4] that provide only binary relevane. 5.3 Other Datasets Evaluation of monolingual omparison is also performed on the other existing resoures suh as Krestel et al. [4]. This dataset onsists of 1600 relevane judgments onstituting 17 news artiles overing different topis with the Tweets labeled as relevant or irrelevant for the eah news artile. We term this resoure as Dataset Evaluation Metris For many pairwise semanti similarity tasks statistial orrelation based measures have been used. Here, we use Pearson orrelation oeffiient (r) to evaluate our approahes on the dataset we reated. While, measures like auray is used for other datasets. 6 Experimental Results In this setion, we present our experimental results on different datasets with variation in parameters. 6.1 Baselines Two different baselines are used to ompare with our approah. Latent Dirihlet Alloation (LDA) Most of the earlier researh [2,3] have shown signifiant interest to ompare news and tweets with LDA and its variations. We use the polylingual topi model [23] trained on English and German Wikipedia with 100 topis to support multiple languages. Similarity between tweet and news represented as topis vetor is measured using osine similarity. WTMF-G Weighted Textual Matrix Fatorization on Graphs (WTMF-G) [3] is one of the baseline that ompare tweets and news based on a graph onneted by hashtags, named entities or temporal information. To train the WTMF-G model we used regularization oeffiient (λ = 20), weight of missing words as w n = 0.01, number of neighbors (k = 4) and link weights (δ = 3) as suggested in earlier researh. Latent dimension of 100 is used to represent tweet and news, while similarity between them is alulated using osine similarity. 8 Page 9

9 6.2 TWE Implementation Major parameters that affet training of Glove is the dimensionality of word embeddings and the size of word ontext window. We hoose 25, 50, 100, 200, 400 word embedding dimensions and 5 words on left and right ontext window. Similarly, later for learning bilingual word embeddings we used Bilbowa tool 10 to learn same embedding dimensions as former with 5 word left ontext window and entire English-German Europarl-v7 11 as the parallel data. In both ases, ount of words less than 2 in the entire orpus are disarded. 6.3 Monolingual Comparison Before omparing monolingual news and tweets, we estimate the quality of embedding transformation ahieved with Monolingual-TWE by performing similar experiments as in 3. The transformation an be either from tweets to news (T2N) or in the opposite orientation (N2T). Though both of them have different transformation, we observed that they produe similar t-sne visualization. Also, there is a slight derease in distane between ommon words aross olletions as ompared to without transformation. Average RBO measure using the top 5000 frequent terms observed in both tweets and news olletions in German and English is realulated to pereive the refinement. We pereived that there is an improvement of 24.4% and 21.2% for English and German respetively. Now, tweets and news artiles in Dataset-1 and Dataset-2 are represented as the tf-idf weighted average of transformed word embeddings. They are now used as input to SVM lassifier 12 with default parameters to alulate auray and to osine similarity for finding Pearson orrelation. Furthermore, top performing embedding dimensions are identified based on Pearson orrelation and auray measures using validation data of the datasets. Figure 1 and Figure 2 show the omparison of results with ((T2N)TWE and (N2T)TWE) and without (Non- TWE) transformation on different datasets. One the top performing embedding dimensions are identified, testing data is used to ompare different approahes with diverse measures in Table 3 and Table Cross-Lingual Comparison For the ross-lingual omparison, we follow a similar proedure as in 6.3. Sine, news word embeddings inorporate bilingual information from both German and English, alulation of RBO measure between tweets and news without transformation is not appropriate. Hene, we alulate RBO measure after transformation to verify that it satisfies minimum threshold of 0.328, whih in general feth satisfatory results [8]. Now to ompare tweets and news belonging to the dataset listed in 5.1 aross languages, we estimate the top performing embedding dimension based on Pearson orrelation measure using the validation data

10 Fig. 1. Effet of Embedding Dimensions(Dataset-1) Fig. 2. Effet of Embedding Dimensions(Dataset-2) Method Dim r German No-Transformation LDA-PTM [23] WTMF-G [3] (T2N)Monolingual-TWE (N2T)Monolingual-TWE English No-Transformation LDA-PTM [23] WTMF-G [3] (T2N)Monolingual-TWE (N2T)Monolingual-TWE Table 3. Monolingual Tweets and News Comparison of the dataset. Figure 3 show the omparison of results with (TWE) and without (Non-TWE) transformation. One the top performing embedding dimension is identified, testing data is used to ompare different approahes as provided in Table 5.

11 Method Dim Auray LDA-PTM [23] % Boosting [4] % (T2N)Monolingual-TWE+SVM % (N2T)Monolingual-TWE+SVM % Table 4. Auray (English) Fig. 3. Effet of Embedding Dimensions(Cross-Lingual) Method r (E)Tweets - (G)News LDA-PTM [23] (T2N)Cross-Lingual-TWE (N2T)Cross-Lingual-TWE (G)Tweets - (E)News LDA-PTM [23] (T2N)Cross-Lingual-TWE (N2T)Cross-Lingual-TWE Table 5. Cross-Lingual Tweets and News Comparison With 100-Dimensions 7 Disussion We start our analysis with results observed in the Table 3. It an be omprehended that the Monolingual-TWE (either T2N or N2T) ahieved an ommendable improvement over other approahes. However, the values for Pearson orrelation are low and an be assoiated to the fat that Tweets and news are inherently very different and ahieving high level of pairwise similarity is a omplex task. But for auray assessment, whih is mostly seen from the perspetive of a lassifiation task there is lear improvement over other approahes by using transformed embeddings as features. Table 4 shows that T2N ahieved better performane as ompared to N2T. Although aforementioned analysis is pereived on a small dataset. The results show a promising diretion to use Monolingual-TWE whih an easily sale with the size of ommon voabulary aross olletions. Thus giving a possibility

12 to improve or sustain the auray and Pearson orrelation values on larger datasets. Similar observations an be enuniated about ross-lingual-twe. Given the omplexity assoiated with finding pairwise relevane between tweets and rosslanguage news, we ompared only LDA based approahes with ross-lingual- TWE. It an be omprehended from Table 5 that T2N outperformed LDA-PTM with notable improvement. Although it may not be signifiant, these results only show preliminary examination to pereive researh in this diretion. 8 Conlusion and Future Work In this paper, we foused on mapping tweets with monolingual and ross-lingual news by transforming their word embeddings loser to eah other, thus bridging the lexial and word usage gap aross olletions. In future, we aim to improve the quality of results with more sophistiated approahes. Referenes 1. Kwak, H., Lee, C., Park, H., Moon., S.: What is twitter, a soial network or a news media?. In: Proeedings of ACM (2010) Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.P., Yan, H., Li., X.: Comparing twitter and traditional media using topi models. In: Advanes in Information Retrieval., Springer Berlin Heidelberg (2011) Guo, W., Li, H., Ji, H., Diab., M.T.: Linking tweets to news: A framework to enrih short text data in soial media. In: Proeddings of ACL. (2013) Krestel, R., Werkmeister, T., Wiradarma, T.P., Kasnei., G.: Tweet-reommender: Finding relevant tweets for news artiles. In: Proeedings of WWW, ACM (2015) Pennington, J., Soher, R., Manning., C.D.: Glove: Global vetors for word representation. In: Proeedings of EMNLP. (2014) Gouws, S., Bengio, Y., Corrado., G.: Bilbowa: Fast bilingual distributed representations without word alignments. In: arxiv preprint arxiv: (2014) 7. Wang, C., Mahadevan., S.: Manifold alignment using prorustes analysis. In: Proeedings of ICML, ACM (2008) Tan, L., Zhang, H., Clarke, C.L., Smuker., M.D.: Lexial omparison between wikipedia and twitter orpora by using word embeddings. In: Proeedings of ACL. (2015) 9. Ritter, A., Etzioni, O., Clark., S.: Open domain event extration from twitter. In: Proeedings of KDD. (2012) Thelwall, M., Bukley, K., Paltoglou., G.: Sentiment in twitter events. Journal of the Amerian Soiety for Information Siene and Tehnology. 62 (2011) Paul, M.J., Dredze., M.: You are what you tweet: Analyzing twitter for publi health. In: Proeedings of ICWSM. (2011) Ratkiewiz, J., Conover, M., Meiss, M., Gonalves, B., Flammini, A., Menzer., F.: Deteting and traking politial abuse in soial media. In: Proeedings of ICWSM. (2011)

13 13. Crooks, A., Croitoru, A., Stefanidis, A., Radzikowski., J.: #earthquake: Twitter as a distributed sensor system. Transations in GIS. 17(1) (2013) Abel, F., Gao, Q., Houben, G.J., Tao., K.: Analyzing user modeling on twitter for personalized news reommendations. In: Proeddings of UMAP. (2011) Ou, J., Liu, N.N., Zhao, K., Yu, Y., Yang., Q.: Transferring topial knowledge from auxiliary long texts for short text lustering. In: Proeddings of CIKM., ACM (2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukuoglu, K., Kuksa., P.: Natural language proessing (almost) from srath. The Journal of Mahine Learning Researh 12 (2011) Al-Rfou, R., Bryan, P., Steven., S.: Polyglot: Distributed word representations for multilingual nlp. In: Proeedings of CoNLL, ACL (2013) Ramon, A.F., Amir, S., Lin, W., Silva, M., Tranoso., I.: Learning word representations from sare and noisy data with embedding sub-spaes. In: Proeedings of ACL. (2015) 19. Kim, J., Rousseau, F., Vazirgiannis., M.: Convolutional sentene kernel from word embeddings for short text ategorization. In: Proeedings of EMNLP. (2015) 20. der Maaten, L.V., Hinton., G.: Visualizing data using t-sne. The Journal of Mahine Learning Researh 9 (2008) Webber, W., Moffat, A., Zobel., J.: A similarity measure for indefinite rankings. ACM Transations on Information Systems (TOIS). 4 (2010) 22. Shönemann, P.H.: A generalized solution of the orthogonal prorustes problem. Psyhometrika. 31(1) (1966) Mimno, D., Wallah, H.M., Naradowsky, J., Smith, D.A., MCallum., A.: Polylingual topi models. In: Proeedings of EMNLP, ACL (2009)

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Attributed Social Network Embedding

Attributed Social Network Embedding JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Efficient Online Summarization of Microblogging Streams

Efficient Online Summarization of Microblogging Streams Efficient Online Summarization of Microblogging Streams Andrei Olariu Faculty of Mathematics and Computer Science University of Bucharest andrei@olariu.org Abstract The large amounts of data generated

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

arxiv: v2 [cs.ir] 22 Aug 2016

arxiv: v2 [cs.ir] 22 Aug 2016 Exploring Deep Space: Learning Personalized Ranking in a Semantic Space arxiv:1608.00276v2 [cs.ir] 22 Aug 2016 ABSTRACT Jeroen B. P. Vuurens The Hague University of Applied Science Delft University of

More information

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer

More information

Summarizing Answers in Non-Factoid Community Question-Answering

Summarizing Answers in Non-Factoid Community Question-Answering Summarizing Answers in Non-Factoid Community Question-Answering Hongya Song Zhaochun Ren Shangsong Liang hongya.song.sdu@gmail.com zhaochun.ren@ucl.ac.uk shangsong.liang@ucl.ac.uk Piji Li Jun Ma Maarten

More information

Mining Topic-level Opinion Influence in Microblog

Mining Topic-level Opinion Influence in Microblog Mining Topic-level Opinion Influence in Microblog Daifeng Li Dept. of Computer Science and Technology Tsinghua University ldf3824@yahoo.com.cn Jie Tang Dept. of Computer Science and Technology Tsinghua

More information

A deep architecture for non-projective dependency parsing

A deep architecture for non-projective dependency parsing Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective

More information

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Kang Liu, Liheng Xu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy

More information

Term Weighting based on Document Revision History

Term Weighting based on Document Revision History Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465

More information

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Cross-lingual Text Fragment Alignment using Divergence from Randomness Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4

ATENEA UPC AND THE NEW Activity Stream or WALL FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4 ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4 1 Universitat Politècnica de Catalunya (Spain) 2 UPCnet (Spain) 3 UPCnet (Spain)

More information

Team Formation for Generalized Tasks in Expertise Social Networks

Team Formation for Generalized Tasks in Expertise Social Networks IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval Yelong Shen Microsoft Research Redmond, WA, USA yeshen@microsoft.com Xiaodong He Jianfeng Gao Li Deng Microsoft Research

More information

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v2 [cs.cv] 30 Mar 2017 Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Learning to Rank with Selection Bias in Personal Search

Learning to Rank with Selection Bias in Personal Search Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

HLTCOE at TREC 2013: Temporal Summarization

HLTCOE at TREC 2013: Temporal Summarization HLTCOE at TREC 2013: Temporal Summarization Tan Xu University of Maryland College Park Paul McNamee Johns Hopkins University HLTCOE Douglas W. Oard University of Maryland College Park Abstract Our team

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation Multimodal Technologies and Interaction Article Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation Kai Xu 1, *,, Leishi Zhang 1,, Daniel Pérez 2,, Phong

More information

Comment-based Multi-View Clustering of Web 2.0 Items

Comment-based Multi-View Clustering of Web 2.0 Items Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

arxiv: v1 [cs.cl] 20 Jul 2015

arxiv: v1 [cs.cl] 20 Jul 2015 How to Generate a Good Word Embedding? Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, China {swlai, kliu,

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Word Embedding Based Correlation Model for Question/Answer Matching

Word Embedding Based Correlation Model for Question/Answer Matching Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Word Embedding Based Correlation Model for Question/Answer Matching Yikang Shen, 1 Wenge Rong, 2 Nan Jiang, 2 Baolin

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting El Moatez Billah Nagoudi Laboratoire d Informatique et de Mathématiques LIM Université Amar

More information

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577

More information

arxiv: v2 [cs.cl] 26 Mar 2015

arxiv: v2 [cs.cl] 26 Mar 2015 Effective Use of Word Order for Text Categorization with Convolutional Neural Networks Rie Johnson RJ Research Consulting Tarrytown, NY, USA riejohnson@gmail.com Tong Zhang Baidu Inc., Beijing, China Rutgers

More information

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA International Journal of Semantic Computing Vol. 5, No. 4 (2011) 433 462 c World Scientific Publishing Company DOI: 10.1142/S1793351X1100133X A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Unsupervised Cross-Lingual Scaling of Political Texts

Unsupervised Cross-Lingual Scaling of Political Texts Unsupervised Cross-Lingual Scaling of Political Texts Goran Glavaš and Federico Nanni and Simone Paolo Ponzetto Data and Web Science Group University of Mannheim B6, 26, DE-68159 Mannheim, Germany {goran,

More information

Finding Translations in Scanned Book Collections

Finding Translations in Scanned Book Collections Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University

More information

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

A Vector Space Approach for Aspect-Based Sentiment Analysis

A Vector Space Approach for Aspect-Based Sentiment Analysis A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY Philippe Hamel, Matthew E. P. Davies, Kazuyoshi Yoshii and Masataka Goto National Institute

More information

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Variations of the Similarity Function of TextRank for Automated Summarization

Variations of the Similarity Function of TextRank for Automated Summarization Variations of the Similarity Function of TextRank for Automated Summarization Federico Barrios 1, Federico López 1, Luis Argerich 1, Rosita Wachenchauzer 12 1 Facultad de Ingeniería, Universidad de Buenos

More information

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries Ina V.S. Mullis Michael O. Martin Eugenio J. Gonzalez PIRLS International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries International Study Center International

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Automating the E-learning Personalization

Automating the E-learning Personalization Automating the E-learning Personalization Fathi Essalmi 1, Leila Jemni Ben Ayed 1, Mohamed Jemni 1, Kinshuk 2, and Sabine Graf 2 1 The Research Laboratory of Technologies of Information and Communication

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

There are some definitions for what Word

There are some definitions for what Word Word Embeddings and Their Use In Sentence Classification Tasks Amit Mandelbaum Hebrew University of Jerusalm amit.mandelbaum@mail.huji.ac.il Adi Shalev bitan.adi@gmail.com arxiv:1610.08229v1 [cs.lg] 26

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,

More information

Diagnostic Test. Middle School Mathematics

Diagnostic Test. Middle School Mathematics Diagnostic Test Middle School Mathematics Copyright 2010 XAMonline, Inc. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Semantic and Context-aware Linguistic Model for Bias Detection

Semantic and Context-aware Linguistic Model for Bias Detection Semantic and Context-aware Linguistic Model for Bias Detection Sicong Kuang Brian D. Davison Lehigh University, Bethlehem PA sik211@lehigh.edu, davison@cse.lehigh.edu Abstract Prior work on bias detection

More information

THE world surrounding us involves multiple modalities

THE world surrounding us involves multiple modalities 1 Multimodal Machine Learning: A Survey and Taxonomy Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency arxiv:1705.09406v2 [cs.lg] 1 Aug 2017 Abstract Our experience of the world is multimodal

More information

Characterizing Mathematical Digital Literacy: A Preliminary Investigation. Todd Abel Appalachian State University

Characterizing Mathematical Digital Literacy: A Preliminary Investigation. Todd Abel Appalachian State University Characterizing Mathematical Digital Literacy: A Preliminary Investigation Todd Abel Appalachian State University Jeremy Brazas, Darryl Chamberlain Jr., Aubrey Kemp Georgia State University This preliminary

More information

Transformative Education Website Interactive Map & Case studies Submission Instructions and Agreement http://whoeducationguidelines.org/case-studies/ 2 Background What is transformative education? Transformative

More information

arxiv: v1 [math.at] 10 Jan 2016

arxiv: v1 [math.at] 10 Jan 2016 THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the

More information

Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text

Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text Achim Rettinger, Artem Schumilin, Steffen Thoma, and Basil Ell Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

More information