標題: 以語料庫為依據之學術英文字彙之研究
A Corpus-based Approach to Academic Vocabulary
作者: 林美宏
Mei-Hung Lin
Chih-Hua Kuo
關鍵字: 學術英文;文類分析;字彙學習;學術英文字彙;語料庫;EAP;Genre Analysis;Vocabulary;Academic Vocabulary;Corpus-based Approach
公開日期: 2006
摘要: 由於英文在學術界的優勢地位以及高等教育學生人數日益增加等因素,學術英文寫作比以往更受重視。在眾多學術英文文類裡,因為期刊論文代表了主要學術研究成果,並且具有提昇學術地位的功能,使其一直被廣為研究。過去對期刊論文的研究,從不同的層面來探討此一文類,像是段落架構、修辭功能以及語言特色等。在Swales發展出CARS模式後,期刊論文之序論(Introduction)更成為期刊論文裡最被廣為深究的一個章節。 另一方面,字彙學習在近年來由於電腦語料庫相關技術的發展,重新開拓了不同的研究視野。有研究致力於建構相關字彙表,提供學習者明確的字彙學習目標。另有些研究,擴大對個別字彙的研究,延伸探討搭配語(collocation)或字詞組成(lexical bundles)。甚至更有研究探討字彙在不同言談情境(discourse contexts)的使用情形。現今,大多數的字彙研究均採用語料庫為依據之分析方法,並兼以自然語言分析工具協助,探究真實及大量的語料中之字彙使用。然而,在文類分析的範疇裡,很少有研究致力於探討特殊字彙所具有的文類特色,也就是字彙使用和文類的修辭功能有何關係。 本研究因此致力於探討期刊論文裡序論的字彙使用與文類修辭功能間的連結。我們以語料庫結合文類分析為研究方法,探究言步(moves)或是修辭功能如何透過字彙呈現。我們建構了一個以六十篇資訊工程領域期刊論文所組成的專業領域語料庫,然後用自行發展之標註系統標註所有期刊論文的言步,接著以自己研發或是既有的自然語言分析工具量化分析語料庫中期刊論文的字彙。我們利用高頻字彙表分析語料庫裡一般英文字彙(GSL)、學術英文字彙(AWL)以及科技領域字彙(Technical Vocabulary)所佔的比例。結果顯示,科技領域字彙在資訊工程領域期刊論文裡佔有很大的比例。字彙頻率累計表(word frequency profiles)更顯示少部分字彙雖重複性很高,在語料庫裡所有出現的不同字彙中所佔比例卻很低,而低頻率字彙反而佔所有不同字彙一半以上的比例,這顯示某些低頻字也應為期刊論文寫作者的學習目標。我們更因此針對學習目的,建構了能夠涵蓋95%資訊工程領域期刊論文內容的字彙表。另一方面,為了探究能夠顯示言步功能的字彙,我們進而辨別期刊論文裡序論的修辭功能或是言步,並依據每一言步的出現頻率和分佈,將其分為主要及次要言步(major and optional moves),同時也分析主要言步裡的常見言步組合(common move patterns)。為了瞭解言步如何透過字彙來呈現,我們把研究層面從字彙擴展到字詞組成,因為我們認為在文類裡應有一些能代表其修辭功能之字詞組成。我們分別在序論以及每個言步的語料庫裡探究字詞組成,並將所找到的字詞組成,以其功能分為兩類:一為能表現某一言步修辭功能之字詞組成,一為表現普遍學術語用功能之字詞組成。最後,我們探討如何將研究成果應用在學習學術英文字彙上。
English for Academic Purposes (EAP) has been attracting more attention than it was because of the predominant role of English in the research world and the increasing number of students in higher education. Research articles (RAs), among all the genres in EAP, have been widely studied as a result of their wide distribution and promotional nature. Studies of RAs have examined various aspects of this genre, especially the textual organization, rhetorical functions, and linguistic features. The examination of RA Introduction, in particular, becomes the most studied section, following the seminal work of Swales’ CARS model. On the other hand, vocabulary learning has regained momentum in recent years. Some studies focused on providing learners with specific vocabulary learning goals through developing wordlists of different purposes. Some further extended the study of vocabulary to word combinations such as collocations or lexical bundles. Still others investigated how words are used in various discourse contexts. Most vocabulary studies nowadays are based on the analysis of target corpora. The corpus-based approach exploits authentic and large amount of language use data, often using NLP tools to facilitate efficient analysis. However, in the field of genre analysis, little research has been devoted to the generic nature of specialized vocabulary; in other words, relating vocabulary use to the rhetorical functions of a genre. This study, therefore, aims at exploring vocabulary use in RAs, particular in the Introduction section, in relation to its rhetorical functions. A corpus-based, genre-informed approach is used to examine how rhetorical functions or moves are realized through move-signaling words. We construct a specialized corpus, consisting of 60 RAs in the field of computer science (CS). All the RAs are coded with a set of self-developed coding scheme. Then, the text samples are analyzed quantitatively with the help of readily-available or self-developed NLP tools. To explore the nature of words used in the RAs in this particular field, we compile the frequency list of the corpus and analyze the coverage of the GSL(28.20%), AWL(12.75%), and technical words (as generally represented by off-list words) (59.05%) in the list. As shown from these figures, technical vocabulary accounts for a great deal in the CS corpus, suggesting the vocabulary learning goal of learners in CS could be directed towards words other than GSL or AWL. Word frequency profiles further reveal that a very small number of word-forms have very high occurrence rate while low frequency words account for more than half of the vocabulary of the corpus. It can then be inferred that the low-frequency words form a very wide range of vocabulary repertoire RA writers need to use. As a result, we further develop a CS wordlist for pedagogical purposes. It consists of 1402 word families and covers 95% of the vocabulary (types) in the corpus. Next, our focus is directed towards identifying rhetorical functions or moves in RA Introductions in order to further investigate move-signaling words. The major and optional moves are identified based on frequency and range. We then analyze common move patterns for each of the major moves, including 3-move and 4-move patterns. To explore how the moves are realized through vocabulary, we extend our examination from words to word combinations (or lexical bundles) since each register has its own set of lexical bundles which can represent its typical rhetorical functions. Lexical bundles in the Introduction as well as each major move are found. It is observed that there are two types of meaningful bundles. One is the bundles that can signal the rhetorical functions of a specific move while another type of bundles reflects general academic discourse functions, categorized in this study as general bundles. General bundles are further categorized into stances bundles, discourse organizers and referential bundles based on the discourse functions they perform in texts. Among them, referential bundles are found most frequently used. Pedagogical applications and implications such as the use of concordancing tools in the learning of academic vocabulary are finally discussed on the basis of research results.
Appears in Collections:Thesis

Files in This Item:

  1. 552201.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.