Framework for Plagiarism Detection Using Logical Tree- Structured Features and Multi-Layer Clustering

Size: px
Start display at page:

Download "Framework for Plagiarism Detection Using Logical Tree- Structured Features and Multi-Layer Clustering"

Transcription

1 Journal of Contemporary Management Submitte on 25/10/2015 Article ID: Salha Alzahrani, Naomie Salim, an Vasile Palae Frameork for Plagiarism Detection Using Logical Tree- Structure Features an Multi-Layer Clustering Dr. Salha Alzahrani (Corresponing Author) College of Computers an IT, Taif University Haiah, Taif, Saui Arabia Homepage:.c2learn.com Prof. Naomie Salim Faculty of Computing, Universiti Teknologi Malaysia Skuai, Johor, Malaysia Homepage: comp.utm.my/naomie/ Prof. Vasile Palae Faculty of Engineering an Computing, Coventry University CV1 5FB Priory Street, Coventry, Unite Kingom Homepage:.cs.ox.ac.uk/vasile.palae/ Abstract: Different practices of scientific misconuct have appeare recently an that impose the nee for more sophisticate solutions. Logical tree-structure features escribe the topology of scientific publications in terms of meaningful parts such as title, abstract, backgroun, methos, results, an references. This paper presents the methoology propose to uncover plagiarism in scientific publications using structural ocument features an multi-layer clustering. Logical treestructure features are extracte as generic classes. Structural components such as paragraphs are organise uner these generic classes. Instea of using traitional flat-base plagiarism etection methos, a layer-base clustering approach is propose to fin similar clusters an perform caniate retrieval using the top layer features. The bottom layer features are use to cluster structural components an to etect plagiarism. The suggeste frameork can be more efficient an reliable to etect plagiarism in scholarly articles than existing approaches. Keyors: logical organisation, tree-structure features, clustering, plagiarism etection JEL Classifications: C00, C82, C Introuction The problem of plagiarism in the acaemic orl has increase recently ith the gigantic amount of igital resources an open access journals available on the Internet. Universities, publishers an iniviuals ten to use automatic plagiarism checkers to ensure the integrity of scholarly orks. Hoever, there are many ays to enhance the process of plagiarism etection in scientific publications in comparison ith the current anti-plagiarism softare. Scientific publications ten to have consistent structure ith subsequent parts. Several stuies on information extraction have aresse the structure of scientific publications (Burget, 2007; Hagen et al., 2004; Lee et al., 2003; Li an Ng, 2004; Wang et al., 2005; Witt et al., 2010; Zhang et al., 2006). Segmentation of scholarly ocuments takes into consieration that the content structure is presente ~ 27 ~

2 ISSNs: (Print); (Online) Acaemic Research Centre of Canaa by visual or physical elements, e.g. location, position, punctuations, length, font size or type, etc. They may also epen on some keyors, e.g. chapter, introuction, etc., to label a specific content. Several stuies have efine the logical structure of the ocuments using ifferent terminologies as text-type structures (Hagen et al., 2004; Siharthan an Teufel, 2007; Teufel an Moens, 2002), an generic classes in scholarly papers (Luong et al., 2010). Different components extracte from the ocument can be generalise uner these types/classes. For example, Tuefel et al. (2002) efine seven types of text, or argumentative zones accoring to a so-calle rhetorical status, namely On, Other, Backgroun, Textual, Aim, Basis, an contrast. Clustering is the process of grouping together objects or components that ten to have the same or similar features (Manning et al., 2009). Each group of objects is calle a cluster. Clustering iffer from classification in the fact that e have no iea about the labels (i.e. name of features) in the resulting clusters. Hoever in classification, e have a set of specific labels or categories that e ant to assign each object to one of them. Text clustering aims to iscover ocuments, terms, passages, ebsites, or any textual elements hich certainly share some textual similarly (Bhatia an Deogun, 1998; Manning et al., 2009; Shehata et al., 2010). The similarity perspective of texts can be efine in various ays. For instance, a car an a horse iffer physically but similar in their functionality. Examples of text clustering inclue clustering of big ata collections into smaller sub collections, term clustering to fin share themes or concepts in a ata set, clustering of sentences from larger text objects about certain topic, clustering of ebsites an search results. As an essential technique in text mining an knolege iscovery, text clustering is very useful for exploratory text analysis. Thus, it can be applie to etect plagiarism, or in other ors, to get sense about highly similar textual elements an uplicates. This paper aresses the problem of plagiarism in the acaemic publications such as journal articles an conference papers. The contributions of this paper are tofol: (i) the use of logical tree-structure features for ocument segmentation an representation, an (ii) the use of clustering-base approach at ifferent layers for plagiarism etection. The rest of this paper is organise as follo. Section II iscusses the literature revie relate to textual features an plagiarism etection techniques. Section III escribes the logical treestructure feature extraction metho. Section IV escribes the suggeste algorithms for multi-layer clustering an plagiarism etection. Finally in Section V, e give concluing remarks an future orks nee to be one to complete the experimental orks an accomplish this stuy. 2. Relate Research State-of-the-art research have aresse the textual ata features an applie techniques for plagiarism etection (Alzahrani et al., 2012b; Clough, 2000, 2003). In this section, textual ata features are classifie into to types: flat features an structural features. Tree-structure features of ocuments are escribe in epth. Then, e briefly summarise ifferent plagiarism retrieval tasks an etection approaches. The relationship beteen the tree-structure features an clustering techniques is iscusse to brige the gap that remains highly problematic in the acaemic plagiarism. 2.1 Text features Feature representation of textual ocuments can be classifie into flat an structural features. Flat features refer to the lexical, syntax an semantic properties of the text ithout consiering the orientation of these features throughout the ocument (Alzahrani et al., 2012b). Examples of these features inclue character/or n-grams, phrases, sentences, part-of-speech (POS) tags, an others. ~ 28 ~

3 Journal of Contemporary Management, Vol. 5, No. 1 Structural features, on the other han, represent the text as a tree ith a root noe an chil noes istribute in ifferent layers (at least to layers). For example, a ocument (root noe) can be ivie into sections an sections into paragraphs (chil noes). Such representation exhibits better organisation of the scientific publications as they are highly structure. Structural features also represent better semantics of the content than the flat features. Structural feature extraction can be ivie into block-specific an content-specific (Alzahrani et al., 2012a). Block-specific tree-structure feature representation refers to the use of specific markers such as tags or or counters to represent the tree regarless of the sections in the ocument. A threelevel block-specific tree representation as extracte (Rahman et al., 2007) as shon in Fig. 1 [a]. In (Cho an Rahman, 2009; Rahman an Cho, 2010), a hierarchical ocument organisation similar to (Rahman et al., 2007) as use but ith ifferent imensions for feature vectors as shon in Fig. 1 [b]. Nonetheless, block-specific features coul be semantically insufficient to represent topically relate content in the ocument. Therefore, extraction of content-specific treestructure features oul substantially improve the ocument representation. For example, scientific ocuments can be partitione into sections an sections into paragraphs (Alzahrani et al., 2012b). Tree representations such as ocument-sections-paragraphs or ocument-concepts-chunks oul greatly characterise the semi-structure ocuments such as books, theses an journal articles an conference papers. Hoever, some challenges are impose in content-specific trees such as (i) sections have variable length in comparison to, for instance, fixe-length pages in block-specific trees, an (ii) ifferent sections/concepts coul have ifferent egree of importance hich can be exploite for ifferent purposes such as improving the ocument retrieval an plagiarism etection. Fig. 1. Block-specific tree-structure feature representation of a ocument (Rahman an Cho, 2010) 2.2 Plagiarism etection Several research orks on plagiarism etection have investigate the evelopment an evaluation of computerise techniques that aress this offence. These techniques are generally orking by scanning to textual ocuments, computing the egree of similarity, an highlighting highly similar segments as plagiarism. Most plagiarism etection techniques have utilise flat features to represent the textual ata (Alzahrani et al., 2012b). Fe stuies, on the other han, use structural features for plagiarism etection. For example, a coarse-to-fine frameork for plagiarism etection hich implements ocument-paragraphs-sentences tree for a collection of eb ocuments as propose (Zhang an Cho, 2011). In this regar, matching sentences in the bottom layer obtaine better precision in the plagiarism etection results compare ith the approach in (Rahman et al., 2007). Aitionally, structural information has been investigate to etect significant plagiarism cases in scientific publications (Alzahrani et al., 2012a). MLSOM as use for retrieval of a set of similar ocuments to a suspecte ocument an plagiarism etection (Cho an Rahman, 2009). The top layer performs ocument clustering an retrieval, an the bottom layer plays an important role for etecting similar, potentially plagiarise, ~ 29 ~

4 ISSNs: (Print); (Online) Acaemic Research Centre of Canaa paragraphs. Given a query ocument q, a tree-structure ocument partitioning approach as firstly use to construct the tree ocument-pages-paragraphs. Seconly, feature vectors of the ocuments ere constructe using a vocabulary table an PCA projection matrix, an use as input vector x i. Thirly, neurons in the upper level are matche ith x i to fin the most similar neurons, i.e. ocuments, using Eucliean istance. A set of ocuments D x is marke as having global similarity ith q an use in the next step. Fourthly, the associate noes of x D x in the bottom layer ere compare in-epth ith the thir level noes of q using a paragraph-to-paragraph similarity metric, an the most similar paragraph is the one ith the smallest ifference. 2.3 Briging the gap To sum up, textual features vary from simple lexical features to comprehensive structural features. To ocuments having similar or-histograms at root noes may be completely ifferent in terms of the semantics an context. It is because of ifferent orientation of the same set of ors throughout the ocument, hich is reflecte by the iscriminative loer parts of the tree ata. Thus, tree structure representation can help to achieve better analysis of ocuments an plagiarism etection. Existing techniques applie for the problem of plagiarism etection o not consier content-specific tree-structure features an multi-layer clustering. In aition, the scope of the current methos (Cho an Rahman, 2009; Rahman et al., 2007) that use block-specific features is limite to the literal plagiarism. This research ork aims to brige this gap by using contentspecific tree-structure features representation better than the one use in (Cho an Rahman, 2009). For this aim, e propose the use of logical feature extraction from scientific ocuments an multi-layer clustering (i.e. the use of clustering at ifferent layers). Clustering the root noes ill perform source ocument retrieval an clustering at the bottom letters ill guie for in-epth analysis an plagiarism etection. 3. Logical Tree-Structure Document Moel Scientific publications have a common structure that begins ith a title, authors, abstract, keyors, an the boy hich splits into several parts/components incluing heaers, paragraphs, lists, tables, captions, quotes, references an so on. In contrast to the bag-of-ors -base features use by existing methos (Barrón-Ceeño an Rosso, 2009; Grozea et al., 2009; Kasprzak et al., 2009; Lackes et al., 2009), this ork implements a feature extraction metho that combines structural information an term information from scientific articles. Folloing sections iscuss the segmentation process of scientific articles into structural components, the extraction of the logical tree-structure features, the eighting algorithm of structural components, an the construction of the vocabulary lists. A complete algorithm for the propose tree-structure feature extraction metho (TFEM) use in this stuy is presente in the last section to sum up the hole approach. 3.1 Component-base segmentation One of the goals in this stuy is to capture the semantic organizational features of scientific publications. In this ork, e propose a tool an a metho for structural components extraction base on the visual layout of the ocument an the ra text (Luong et al., 2010). The tool orks by extracting structural components using visual escriptors an keyor inicators. It can extract ifferent constructs namely Title, Author, Aress, Affiliation, Keyors, an Boy. The boy contains Equations, Figures, Figure captions, Footnotes, List items, Notes, References, Section heaers, Subsection heaers, Sub-subsection heaers, Tables, an Table captions. 3.2 Logical tree-structure extraction The use of the tree-structure feature representation facilitates the analysis of scientific articles in a hierarchal, rather than a flat, manner. As mentione in Section II, block-specific tree-structure ~ 30 ~

5 Journal of Contemporary Management, Vol. 5, No. 1 features such as ocument-pages-paragraphs (Cho an Rahman, 2009) an ocumentparagraphs-sentences (Zhang an Cho, 2011) are not sufficient to represent the semantic organisation of scholarly ocuments. Therefore, e aim to employ content-specific tree-structure organisation herein scientific articles are represente in a logical hierarchical tree namely ocument -> generic classes -> structural components By the or generic classes, e mean a section or a group of sections that serve a unique purpose. We believe that classes convey more semantically relate components than pages. To reflect the scientific topology in scholarly ocuments, e propose the folloing generic classes: G = Title, Oner, Abstract, Introuction, Literature revie, Methoology, Evaluation, Conclusions, Acknolegments an References 3.3 Component-base eighting A component eight C, for a structural component C in a ocument can be efine as a quantitative function hich measure the eight of a structural component C, base on the relevance beteen terms in C an other structural components (e Moura et al., 2010). In this regar, C, efines a qualitative importance of a component C in scholarly ocuments, hich can be assigne manually by an expert uring the inexing phase of ocuments. Some methos have been evelope (Bounhas an Slimani, 2010; e Moura et al., 2010; Marques Pereira et al., 2005; Marteau et al., 2006) that use typical TF-IDF eighting but ith structural components of ocuments taken into consieration. In this paper, e use the approach propose in (Alzahrani et al., 2012a) to compute C, automatically. To statistical measures namely Depth an Sprea (Alzahrani et al., 2012a) are aapte, as belo. Sprea of a term t in scholarly ocument is the number of structural components in that contain t: 1 if t C (1) Sprea( t, ) i here i C 0 ~ 31 ~ otherise Depth of a term t in a generic class G refers to the frequency of t in G normalize by the maximum frequency in G such that e o not unerestimate classes ith lo components. tf t,g Depth( t,g ) MAX t,g here tf t,g is the term frequency in generic class G, an MAX tꞌ,g is the maximum frequency gaine by a term tꞌ in G. Sprea-base an Depth base component-eight factors are efine at component level, as follos: Sprea( t, ) t C (3) C, C, C (2) Depth( t,g ) t C (4) C here t refers to inex terms in a component C, is the article that has C, C is the size of C. Finally, e combine Depth an Sprea into a single factor. C, Depth( t,g ) Sprea( t, ) t C (5) C 3.4 Vocabulary builing To buil the vocabulary list, three steps nee to be one. First is to construct the term frequency table hich contains the terms an their occurrence information in structural components in each, as follos:

6 ISSNs: (Print); (Online) Acaemic Research Centre of Canaa (6) t ft, C tf t, C C, t ft, tft, C G CG here tf t,c is the frequency of a term t in a structural component C, C, is the combine componenteight factor given by formula (5), an t f t an, C t f t are the ne frequency measure of terms in C, an combine ith the structural information taken from the ocument. We construct term eighting table using the frequency table in a ay similar to VSM moel, as follos: D tf.log (8) t, t, D : t here D is total number of ocuments in the ataset, an D t is the number of ocuments in the collection that contains t. Then, the vocabulary table T is built hich inclues terms that obtain the top eights. For ocument features, e ill consier 100 terms, hile for generic classes G an structural components C, 150 an 200 top-frequency terms ill be use, respectively. 3.5 Tree-structure feature extraction The propose algorithm for feature extraction is shon in Fig. 2. For all structural components C in each, e ill construct the feature vector f C from term frequency compute in (6), as state in formula (9). Then, the feature vectors for generic classes calle f G can be obtaine as in equation (10). f tf, tf,..., tf ] (9) C [ t1, C t2, C tn, C (7) G fc C G (10) f On the other han, ocument feature vector f is constructe by using the eights compute in formula (8) as belo. f,,..., ] (11) [ t1, t2, tn, here n is the selecte number of top terms to represent the feature vectors in each layer. Fig. 2. Tree-structure feature extraction metho (TFEM) ~ 32 ~

7 Journal of Contemporary Management, Vol. 5, No Multi-Layer Clustering an Plagiarism Detection In plagiarism etection research, e eal ith to sets of ocuments: source collection D an query ocuments Q. In this stuy, both sets are represente as content-specific tree-structure features. The propose frameork inclues three main steps: Step 1: Clustering at the top layer. Clustering is performe at the top layer base on ocument features f. The aim of this step is to fin a subset of the ocument collection D x ϲd q D q hich is relatively smaller than D. Step 2: Clustering at the mile layer. For each query ocument q, e just use the set of relatively similar ocuments D x obtaine from step 1. Then, clustering on D x is performe at the mile layer base on generic class features f G. The aim of this step is to fin similar sections or subjects beteen ocuments (i.e. generic classes), an mark them for further analysis. Step 3: Clustering at the bottom layer an plagiarism etection. This step aims to fin all suspicious components C q in q q D q hich are plagiarise from structural components C x in x x D x using structural component-base comparison algorithm explaine belo. Clustering in the top an mile layers can be achieve using general text clustering techniques such as generative probabilistic moels, agglomerative hierarchical clustering (Bhatia an Deogun, 1998), an K-means clustering algorithms (Manning et al., 2009). Then, to fin the cluster that is most likely to contain the set of source ocuments, e ill use the cosine the similarity beteen the centre of each cluster j an the query q can be calculate as follos: j q Sim( j, q ) j q n i n 2 ti i1 1 ti, j ti, q In the last step, etaile analysis an similarity calculation are performe to fin the structural components that are highly similar. Further analysis by humans may esignate plagiarise components from properly cite ones. To this en, associate noes of x D x in the bottom layer ill be compare component-to-component ith the feature vectors of thir layer of q. By components e generally mean paragraphs. The similarity beteen the feature vectors of structural components can be calculate using vector ifference. The most similar paragraph is the one ith the smallest ifference as state by the equation belo., j n i1 2 ti, q (12) PD( q, x ) Cq q (min C ) x f x C f q Cx (13) here f C are the paragraph features for ocuments q an x. 5. Conclusion an Future Work Plagiarism in scientific publications is aresse in this paper. We propose a rough-to-fine frameork for feature extraction namely logical content-specific tree-structure features herein structural components are organise uner generic classes. Clustering is suggeste at ifferent layers to achieve ocument retrieval an plagiarism etection. The suggeste methos an algorithms exhibit better unerstaning of the semantic content an exploratory analysis of scientific publications. Future orks inclue the construction of a groun-truth ataset of scientific ocuments taking into account accurate XML tree representation. Experimental orks shoul be performe on the ataset to evaluate the propose frameork. More in-epth analysis on structural components shoul be performe an information visualization methos can be use for highlighting plagiarism in a ay that is ifferent from other types of ocuments. ~ 33 ~

8 ISSNs: (Print); (Online) Acaemic Research Centre of Canaa References [1] Alzahrani, S., et al. (2012a). "Using structural information an citation evience to etect significant plagiarism cases". Journal of the American Society for Information Science an Technology (JASIST), 63(2): [2] Alzahrani, S. M., Salim, N., an Abraham, A. (2012b). "Unerstaning Plagiarism Linguistic Patterns, Textual Features an Detection Methos". IEEE Transactions on Systems, Man, an Cybernetics, Part C: Applications an Revies, 42(2): [3] Barrón-Ceeño, A., an Rosso, P. (2009). "On automatic plagiarism etection base on n- grams comparison". Avances in Information Retrieval (pp ). DOI: / _69. [4] Bhatia, S. K., an Deogun, J. S. (1998). "Conceptual clustering in information retrieval". IEEE Transactions on Systems, Man, an Cybernetics, Part B: Cybernetics, 28(3): [5] Bounhas, I., an Slimani, Y. (2010). "A hierarchical approach for semi-structure ocument inexing an terminology extraction". Paper presente at the International Conference on Information Retrieval an Knolege Management, CAMP'10, Selangor, Malaysia. [6] Burget, R. (2007). "Automatic Document Structure Detection for Data Integration". In: W. Abramoicz (E.), Business Information Systems (Vol. 4439, pp ): Springer Berlin / Heielberg. [7] Cho, T. W. S., an Rahman, M. K. M. (2009). "Multilayer SOM ith tree-structure ata for efficient ocument retrieval an plagiarism etection". IEEE Transactions on Neural Netorks, 20(9): [8] Clough, P. (2000). "Plagiarism in Natural an Programming Languages: An Overvie of Current Tools an Technologies", In: Department of Computer Science, University of Sheffiel, UK, Technical Report CS [9] Clough, P. (2003). "Ol an ne challenges in automatic plagiarism etection". National UK Plagiarism Avisory Service [Online] Available at plagiarism.pf. [10] e Moura, E. S., et al. (2010). "Using structural information to improve search in Web collections". Journal of the American Society for Information Science an Technology, 61(12): DOI: /asi [11] Grozea, C., Gehl, C., an Popescu, M. (2009). "ENCOPLOT: Pairise sequence matching in linear time applie to plagiarism etection". Paper presente at the 25th Conference of the Spanish Society for Natural Language Processing, SEPLN'09, Donostia, Spain. [12] Hagen, L., Haral, L., an Petra Saskia, B. (2004). "Text type structure an logical ocument structure". Paper presente at the ACL Workshop on Discourse Annotation, Barcelona, Spain. [13] Kasprzak, J., Branejs, M., an Křipač, M. (2009). "Fining Plagiarism by Evaluating Document Similarities". Paper presente at the 25th Conference of the Spanish Society for Natural Language Processing, SEPLN'09, Donostia, Spain. [14] Lackes, R., Bartels, J., Bernt, E., an Frank, E. (2009). "A or-frequency base metho for etecting plagiarism in ocuments". Paper presente at the International Conference on Information Reuse an Integration, IRI'09, Las Vegas, NV. [15] Lee, K. H., Choy, Y. C., an Cho, S. B. (2003). "Logical structure analysis an generation for structure ocuments: A syntactic approach". IEEE Transactions on Knolege an Data Engineering, 15(5): [16] Li, Z., an Ng, W. K. (2004). "WICCAP: From semi-structure ata to structure ata". Paper presente at the 11th IEEE International Conference an Workshop on the Engineering of Computer-Base Systems, ECBS'04, Brno, Czech Republic. ~ 34 ~

9 Journal of Contemporary Management, Vol. 5, No. 1 [17] Luong, M.-T., Nguyen, T. D., an Kan, M.-Y. (2010). "Logical structure recovery in scholarly articles ith rich ocument features". International Journal of Digital Library Systems (IJDLS), 1(4): [18] Manning, C. D., Raghavan, P., an Schütze, H. (2009). Flat Clustering Introuction to Information Retrieval (pp ): Cambrige University Press. [19] Marques Pereira, R. A., Molinari, A., an Pasi, G. (2005). "Contextual eighte representations an inexing moels for the retrieval of HTML ocuments". Soft Computing, 9(7): [20] Marteau, P.-F., Ménier, G., an Popovici, E. (2006). "Weighte Naïve Bayes moel for semistructure ocument categorization". Paper presente at the 1st International Conference on Multiisciplinary Information Sciences an Technologies, InSciT2006, Meria, Espagne. [21] Rahman, M. K. M., an Cho, T. W. S. (2010). "Content-base hierarchical ocument organization using multi-layer hybri netork an tree-structure features". Expert Systems ith Applications, 37(4): [22] Rahman, M. K. M., WangPi Yang, Tommy W.S. Cho, an Sitao Wu (2007). "A flexible multi-layer self-organizing map for generic processing of tree-structure ata". Pattern Recognition, 40(5): [23] Shehata, S., Karray, F., an Kamel, M. (2010). "An efficient concept-base mining moel for enhancing text clustering". IEEE Transactions on Knolege an Data Engineering, 22(10): [24] Siharthan, A., an Teufel, S. (2007). "Whose iea as this, an hy oes it matter? Attributing scientific ork to citations". Paper presente at the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2007). Ne York, USA. [25] Teufel, S., an Moens, M. (2002). "Summarizing scientific articles: Experiments ith relevance an rhetorical status". Computational Linguistics, 28(4): [26] Wang, Z.Q., Wang,Y.C., an Gao, K. (2005). A Ne Moel of Document Structure Analysis, Fuzzy Systems an Knolege Discovery (Vol. 3614, pp ): Springer Berlin, Heielberg. [27] Witt, A. an Metzing, D. (2010). "Discourse Relations an Document Structure". In: N. Ie, J. Véronis, H. Baayen, K. W. Church, J. Klavans, D. T. Barnar, D. Tufis, J. Llisterri, S. Johansson & J. Mariani (Es.), Linguistic Moeling of Information an Markup Languages (Vol. 40, pp ): Springer Netherlans. [28] Zhang, H., an Cho, T. W. S. (2011). "A coarse-to-fine frameork to efficiently thart plagiarism". Pattern Recognition, 44(2): [29] Zhang, K., Wu, G., an Li, J. (2006). "Logical structure base semantic relationship extraction from semi-structure ocuments". Paper presente at the 15th International Conference on Worl Wie Web, Einburgh, Scotlan. ~ 35 ~

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

SANTIAGO CANYON COLLEGE Reading & English Placement Testing Information

SANTIAGO CANYON COLLEGE Reading & English Placement Testing Information SANTIAGO CANYON COLLEGE Reaing & English Placement Testing Information DO YOUR BEST on the Reaing & English Placement Test The Reaing & English placement test is esigne to assess stuents skills in reaing

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

SPECIAL ARTICLES Pharmacy Education in Vietnam

SPECIAL ARTICLES Pharmacy Education in Vietnam American Journal of Pharmaceutical Eucation 2013; 77 (6) Article 114. SPECIAL ARTICLES Pharmacy Eucation in Vietnam Thi-Ha Vo, MSc, a,b Pierrick Beouch, PharmD, PhD, b,c Thi-Hoai Nguyen, PhD, a Thi-Lien-Huong

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Sweden, The Baltic States and Poland November 2000

Sweden, The Baltic States and Poland November 2000 Folkbilning co-operation between Sween, The Baltic States an Polan 1990 2000 November 2000 TABLE OF CONTENTS FOREWORD...3 SUMMARY...4 I. CONCLUSIONS FROM THE COUNTRIES...6 I.1 Estonia...8 I.2 Latvia...12

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE University of Amsterdam Graduate School of Communication Kloveniersburgwal 48 1012 CX Amsterdam The Netherlands E-mail address: scripties-cw-fmg@uva.nl

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Unit 7 Data analysis and design

Unit 7 Data analysis and design 2016 Suite Cambridge TECHNICALS LEVEL 3 IT Unit 7 Data analysis and design A/507/5007 Guided learning hours: 60 Version 2 - revised May 2016 *changes indicated by black vertical line ocr.org.uk/it LEVEL

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov Dec. 2015), PP 01-07 www.iosrjournals.org Longest Common Subsequence: A Method for

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes Viviana Molano 1, Carlos Cobos 1, Martha Mendoza 1, Enrique Herrera-Viedma 2, and

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Cross-lingual Text Fragment Alignment using Divergence from Randomness Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Automating the E-learning Personalization

Automating the E-learning Personalization Automating the E-learning Personalization Fathi Essalmi 1, Leila Jemni Ben Ayed 1, Mohamed Jemni 1, Kinshuk 2, and Sabine Graf 2 1 The Research Laboratory of Technologies of Information and Communication

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

A Note on Structuring Employability Skills for Accounting Students

A Note on Structuring Employability Skills for Accounting Students A Note on Structuring Employability Skills for Accounting Students Jon Warwick and Anna Howard School of Business, London South Bank University Correspondence Address Jon Warwick, School of Business, London

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

EDUCATION AND THE PUBLIC DIMENSION OF MUSEUMS

EDUCATION AND THE PUBLIC DIMENSION OF MUSEUMS xce e ce an equity EDUCATION AND THE PUBLIC DIMENSION OF MUSEUMS A Report from the American Association of Museums, 1992 Foreord this report from the American Association of Museums points the ay for museums

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Facing our Fears: Reading and Writing about Characters in Literary Text

Facing our Fears: Reading and Writing about Characters in Literary Text Facing our Fears: Reading and Writing about Characters in Literary Text by Barbara Goggans Students in 6th grade have been reading and analyzing characters in short stories such as "The Ravine," by Graham

More information

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic

More information

An NFR Pattern Approach to Dealing with Non-Functional Requirements

An NFR Pattern Approach to Dealing with Non-Functional Requirements An NFR Pattern Approach to Dealing with Non-Functional Requirements Presenter: Sam Supakkul Outline Motivation The Approach NFR Patterns Pattern Organization Pattern Reuse Tool Support Case Study Conclusion

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT Lectures and Tutorials Students studying History learn by reading, listening, thinking, discussing and writing. Undergraduate courses normally

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis FYE Program at Marquette University Rubric for Scoring English 1 Unit 1, Rhetorical Analysis Writing Conventions INTEGRATING SOURCE MATERIAL 3 Proficient Outcome Effectively expresses purpose in the introduction

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Language Independent Passage Retrieval for Question Answering

Language Independent Passage Retrieval for Question Answering Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University

More information

Knowledge-Based - Systems

Knowledge-Based - Systems Knowledge-Based - Systems ; Rajendra Arvind Akerkar Chairman, Technomathematics Research Foundation and Senior Researcher, Western Norway Research institute Priti Srinivas Sajja Sardar Patel University

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks 3rd Grade- 1st Nine Weeks R3.8 understand, make inferences and draw conclusions about the structure and elements of fiction and provide evidence from text to support their understand R3.8A sequence and

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Abstractions and the Brain

Abstractions and the Brain Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT

More information

National Literacy and Numeracy Framework for years 3/4

National Literacy and Numeracy Framework for years 3/4 1. Oracy National Literacy and Numeracy Framework for years 3/4 Speaking Listening Collaboration and discussion Year 3 - Explain information and ideas using relevant vocabulary - Organise what they say

More information

Degree Qualification Profiles Intellectual Skills

Degree Qualification Profiles Intellectual Skills Degree Qualification Profiles Intellectual Skills Intellectual Skills: These are cross-cutting skills that should transcend disciplinary boundaries. Students need all of these Intellectual Skills to acquire

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS Arizona s English Language Arts Standards 11-12th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS 11 th -12 th Grade Overview Arizona s English Language Arts Standards work together

More information

Term Weighting based on Document Revision History

Term Weighting based on Document Revision History Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465

More information

Maurício Serva (Coordinator); Danilo Melo; Déris Caetano; Flávia Regina P. Maciel;

Maurício Serva (Coordinator); Danilo Melo; Déris Caetano; Flávia Regina P. Maciel; CALL FOR PAPERS 3 rd International Colloquium on Epistemology and Sociology of Management Science 20-22 March 2012 Florianópolis - SC - Brazil Sub-themes: I. Epistemological Analysis of Management Science

More information

Bug triage in open source systems: a review

Bug triage in open source systems: a review Int. J. Collaborative Enterprise, Vol. 4, No. 4, 2014 299 Bug triage in open source systems: a review V. Akila* and G. Zayaraz Department of Computer Science and Engineering, Pondicherry Engineering College,

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

and secondary sources, attending to such features as the date and origin of the information.

and secondary sources, attending to such features as the date and origin of the information. RH.9-10.1. Cite specific textual evidence to support analysis of primary and secondary sources, attending to such features as the date and origin of the information. RH.9-10.1. Cite specific textual evidence

More information

MANAGERIAL LEADERSHIP

MANAGERIAL LEADERSHIP MANAGERIAL LEADERSHIP MGMT 3287-002 FRI-132 (TR 11:00 AM-12:15 PM) Spring 2016 Instructor: Dr. Gary F. Kohut Office: FRI-308/CCB-703 Email: gfkohut@uncc.edu Telephone: 704.687.7651 (office) Office hours:

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Highlighting and Annotation Tips Foundation Lesson

Highlighting and Annotation Tips Foundation Lesson English Highlighting and Annotation Tips Foundation Lesson About this Lesson Annotating a text can be a permanent record of the reader s intellectual conversation with a text. Annotation can help a reader

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

MYP Language A Course Outline Year 3

MYP Language A Course Outline Year 3 Course Description: The fundamental piece to learning, thinking, communicating, and reflecting is language. Language A seeks to further develop six key skill areas: listening, speaking, reading, writing,

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain Myongho Yi 1 and Sam Gyun Oh 2* 1 School of Library and Information Studies, Texas Woman

More information

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9) Nebraska Reading/Writing Standards, (Grade 9) 12.1 Reading The standards for grade 1 presume that basic skills in reading have been taught before grade 4 and that students are independent readers. For

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

Finding Translations in Scanned Book Collections

Finding Translations in Scanned Book Collections Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University

More information

Oakland Unified School District English/ Language Arts Course Syllabus

Oakland Unified School District English/ Language Arts Course Syllabus Oakland Unified School District English/ Language Arts Course Syllabus For Secondary Schools The attached course syllabus is a developmental and integrated approach to skill acquisition throughout the

More information

Problems of the Arabic OCR: New Attitudes

Problems of the Arabic OCR: New Attitudes Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing

More information

Conversational Framework for Web Search and Recommendations

Conversational Framework for Web Search and Recommendations Conversational Framework for Web Search and Recommendations Saurav Sahay and Ashwin Ram ssahay@cc.gatech.edu, ashwin@cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA Abstract.

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information