Thursday, February 21, 2019
A Corpus-Based Analysis of Mixed Code in Hong Kong Speech
2012 International convocation on Asian style touch on A Corpus-based Analysis of meld Code in Hong Kong Speech John Lee Halliday Centre for intelligent Applications of Language Studies Department of Chinese, Translation and Linguistics City University of Hong Kong emailprotected edu. hk AbstractWe rise a head teacher-based analysis of the theatrical role of mixed law in Hong Kong vernacular. From transcriptions of Can tonicityse television receiver programs, we identify side of meat wrangle embedded within Cant whizse utterances, and investigate the motivations for such(prenominal)(prenominal) code-switching.Among the m all motivations observed in previous look, we found that four only when account for more than 95% of the subprogram of slope develops in our dialect data across genres, sexual activitys, and age groups. We performed analyses over more than 60 hours of tinned terminology, resulting in hotshot of the large(p)st empirical studies to-date on thi s linguistic phenomenon. Key dustup-code-mixing side of meat head linguistics. code-switching Cant whizse II. PREVIOUS RESEARCH I. INTRODUCTIONWhile Cantonese is the mother tongue for the extensive mass of the people in Hong Kong, English is as rise as spoken by 43% of the population 1, reflecting the citys heritage as a British colony. A well-known feature of the speech in Hong Kong is code-switching, i. e. , the juxtaposition of passages of speech belonging to deuce different grammatical systems or sub-systems, within the same(p) exchange 2. Specifically, in the case of Hong Kong, the two grammatical systems be Cantonese and English.The occasion serves as the matrix palavering to, and the latter as the embedded language, resulting in Cantonese sentences with English plane sections such as (example taken from 3) mobile chiffonierteen heoi3 canteen jam2 caa4 lets go to the canteen for lunch Here, the English segment contains all one discourse (canteen), besides in oecumenic, it can be a whole article. We will use the general marches code-switching instead than the more circumstantial term code-mixing, which refers to switching be unhopeful the clause level, even though closely English segments in our corpus past contain only one or two wrangle (see control board 3).There is already a large body of literature devoted to the determine of Cantonese-English code-switching from the theoretical linguistic point of view 3,4,5. This paper investigates the motivations roll in the hay the use of mixed code, on the basis of a large dataset of speech canned from television programs. In Section II, we outline previous research on the motivations of code-switching, and plow how our investigation complements theirs. In Section III, we describe our methodology for corpus construction, in particular the design of the taxonomy of code-switching motivations.In Section IV, we present an analysis of these motivations tally to genre, gender and age. Th e first major framework for classifying codeswitching motivations in Hong Kong consists of two categories opportunist and orientational 6. Central to this framework is the distinction between words in juicy Cantonese and low Cantonese. In quotidian conversations, a vocaliser sometimes cannot discovery both word from low Cantonese to describe an object, institution or theme (e. g. , application form). Using a word from high up Cantonese (e. g. , biu2 gaak3), however, would beneficial too musket ball and thereof stylistically inappropriate.In expedient mixing, the speaker resorts to an English word the mixing is pragmatically motivated. In contrast, orientational mixing is socially motivated. The speaker chooses to use English (e. g. , barbecue) despite the availability of equal words from some(prenominal) low Cantonese (e. g. , siu2 je5 sik6) and high Cantonese (e. g. , siu1 haau1), since he perceives the subject matter to be inherently more western. This dichotomy has b een criticized as overly simplistic, because of the equivocalness in defining lexical and stylistic sames among low Cantonese, high Cantonese, and English.Instead, a four-way taxonomy is proposed euphemism, specificity, bilingual punning, and the formula of saving 7. This taxonomy is then further ex plyed, in a study of code-switching in text media 8, to include quotations, doubling, identity marking, and interjection. These categories will be explained in detail in Section III. While these classification systems atomic number 18 comprehensive and well grounded, they do not per se convey any sense of the relative importance or distribution of the various motivations.Our remnant is, first, to empirically verify the coverage of these classification systems on a large dataset of transcribed speech and, second, to give quantitative answers to questions such as Which kinds of motivations ar the most prominent? Does the range of motivations differ according to the speech genre, or t o the speakers gender or age? We now mold our attention to the methodology for constructing and annotating a speech corpus for these research purposes. III. DATA A. Source Material Our corpus is constructed from television programs broadcast in Hong Kong within the cultivation four years by Television Broadcasts throttle (TVB).The programs belong to a variety of genres, including two drama series, three current-affairs shows, a cuttings program, and a speech show. The news program, TVB News at Six-Thirty, carries the most formal exhibit, containing mostly pre-planned 165 978-0-7695-4886-9/12 $26. 00 2012 IEEE DOI 10. 1109/IALP. 2012. 10 speech by the secure. The current-affairs shows, Tuesday Report, Sunday Report and Hong Kong Connection, argon serious in tone but contain spontaneous discussions. The talk show, My Sweets, is about food and drink.It also contains spontaneous discussions, but the topics tend to be lighter. Although pre-planned, the speech in both drama serie s, Moonlight Resonance and Yes Sir, Sorry Sir, is arguably the least formal in register, designed to reflect natural speech in everyday life. Details of these TV programs are presented in add-in 1. put over 1 Television programs that serve as the source material of our corpus. Genre plan distance Current Tuesday Report ( ), 135 episodes affairs ), X 20 minutes Sunday Report ( Hong Kong Connection ( ) parley 24 episodes My Sweets ( ) show X 30 minutesEuphemism When a Cantonese word explicitly mentions something that the speaker finds embarrassing, s/he might opt for an English word that contains no such mention. For example, to reduce the female body part hung1 breast in the word hung1 wai4 bandeau, the speaker might prefer to use the English bra (all examples are taken from 7) bra tau3 bra gaak3 gaak3 A princess whose bra is ocular Specificity Sometimes an English demonstration is preferred because its meaning is more general or specific compared with its near-synonymous c ounterparts, 7 in either low or high Cantonese.For example, the verb to set aside means to accomplish a reservation for which no silver or deposit is required, which is more specific than its closest equivalent in Cantonese, deng6 to put on a reservation. It is often used in sentences such as book ngo5 soeng2 book saam1 dim2 I want to book 3 oclock Principle of Economy An English expression may also be preferred because it is shorter and thus requires less linguistic effort compared with its Chinese/Cantonese equivalent. 7 While the word sign in has two syllables, its Cantonese equivalent baan6 lei5 dang1 gei1 sau2 zuk6 check-in on a plane has six.The principle of economy is thus likely the contend behind mixed code such as check-in nei5 check-in zo2 mei6 aa3 Have you checked in already? The taxonomy in 8 builds on the one in 7, further enriching it with categories2 below reference When citing text or psyche elses speech, one often prefers to use the original code to rev erse having to perform translation. An example is direct speech What do you think? jau5 go3 pang4 jau5 man6 ngo5 what do you think A friend asked me, What do you think? Doubling earlier imaged Emphasis or avoidance of repetition 8, it will be referred to as Doubling 9 here to make it explicit, as this category refers to English words that are embedded alongside Cantonese words that perk up the same or nearly the same meaning. The purpose is to emphasize the idea or to avoid repetitions. In the following sentence, it serves as an emphasis 2 News bid TVB News at Six-Thirty ( ) Moonlight Resonance ( ), Yes Sir, Sorry Sir ( Sir Sir) 5 episodes X 20 minutes 4 episodes X 45 minutes B.Data Processing From the television programs listed in dining table 1, all code-mixed utterances were transcribed, preserving the original languages, either Cantonese or English. Following regulation practice, loan words are not considered to be mixed code in our context, all English words (e. g. , t axicab) that have been adapted into Cantonese phonology (e. g. , dik1 si2) were excluded. The TV captions corresponding to each of these utterances are also recorded as part of the corpus. These captions are in standard Chinese, rather than Cantonese.Furthermore, alignments between the Chinese word(s) in the caption and the English word(s) in the utterance are annotated. This information will be used in the classification of motivations. Finally, two kinds of metadata about the speaker are recorded gender (male or female) and age group (teenager or adult). C. Taxonomy of Code-Switching motifs Our goal is to quantitatively stipulate the motivations behind code-switching to this end, each English segment in the Cantonese sentences in our corpus is to be labeled with a motivation. Due to time constraint, this classification was performed only on the currentaffairs and talk shows.The expedient vs. orientational classification system is too loose for our purpose. Instead, we adopted t he taxonomy in 7,8 as our starting point, then introduced some new categories to accommodate our data. The categories in 7 are1 1 A fourth category, bilingual punning, is excluded from our taxonomy. As may be expected, punning is rarer in speech, and is indeed not found in our corpus. Among these categories is identity marking, for mixed code that tag social characteristics such as social status, education status, occupation, as well as regional affiliation. 8 We found it difficult to objectively identify this motivation, and excluded it from our taxonomy. 166 in truth good very good m4 co3 aa1 Very good, very good insertion English interjections may be inserted into the Cantonese sentence. For example Anyway anyway nei5 hou2 sai1 lei6 ak1 Anyway, you are awesome A significant amount of mixed code in our corpus, however, still does not fit into any of the above categories. Most oarlock under one of two reasons, Personal Name and Register.We therefore added them to our taxonomy R egister This is roughly equivalent to the expedient category in 6, but will be referred to as Register in this paper to make the motivation explicit. Sometimes, the speaker cannot find any equivalent low Cantonese word, but feels awkward to use a more formal high Cantonese word (e. g. , paai1 deoi3 party). As a result, s/he resorts to an English equivalent instead. For example, party hoi1 ci2 laa1 ngo5 dei6 go3 party Our party is starting Personal Name It is communal practice among Hong Kong people to adopt an English name.Although this phenomenon may be considered orientational codemixing in terms of the western perception 6, it is given its own category, because it is very specific and accounts for a substantial amount of our data. A typical example is Teresa, Teresa ngo5 dei6 zing2 dak1 leng3 m4 leng3 Teresa, did we make it nicely? D. Annotation Procedure We thus have a full(a) of eight categories in our taxonomy of code-switching motivations. Five of these categories namely, euphemism, quotation, doubling, interjection, and ad hominem name can usually be unambiguously discerned.The annotator, however, has often found it difficult to distinguish between specificity, register, and principle of economy. To mention consistency, we adopted the following procedure. When an English segment does not fit into any of the five easy categories, the annotator is to decide whether it has the same meaning as the Chinese word in the caption to which it is aligned. If it is deemed not to have the same meaning, then it is assigned specificity. If it is equivalent in meaning, and the annotator cannot think of any equivalent in low Cantonese, then it is labeled register.Lastly, if there is a low Cantonese equivalent, but its number of syllables is larger than that of the English segment, then the motivation is principle of economy. IV. ANALYSIS English segments in Cantonese speech (section A), then discuss the distribution of the categories of motivations, both overall and with respect to genres, genders, and age groups (section B). A. Density and Length of English Segments It is well known that English words are sprinkled rather liberally in the Cantonese speech in Hong Kong. We measure how the frequency of English segments varies across different genres.As shown in Table 2, the frequency correlates with the register of the genre (see Section III. A). In the drama series, the most colloquial genre, one and a half English words are uttered per minute on average. The talk show occupies second place, and the current affairs shows have more or less less frequent English words. In the news program, where the speech is preplanned, the anchor did not utter any English word. Table 2 The original number of Cantonese sentences containing English segments, and the total number of English words transcribed. The last column shows how often an English word is uttered.Program genre maneuver prattle show Current affairs News sent with English 219 487 1495 0 English words 259 625 1995 0 Frequency (words/min) 1. 4 0. 87 0. 74 0 Second, we measure the length of the English segments. Table 3 shows that the vast majority of English segments contain no more than two words. crosswise all genres, more than 80% of the English segments consist of only one English word. This figure is comparable to the 81. 4% for text data inform in 8. Table 3 Proportion of English segments with only one (e. g. , canteen) or two words (e. g. , thank you).Program genre Drama Current affairs Talk show One-word 85% 85% 81% Two-word 11% 11% 17% This section presents some preliminary analyses on this corpus. We first consider the frequency and length of B. Motivations for the use of mixed code A plethora of motivations have been posited for the use of mixed code in Hong Kong (see Section II). Applying our proposed classification system (see Section III. C) on our corpus of transcribed speech, we aim now to discern the relative prevalence of the various kinds of co deswitching motivations. Table 4 shows the distribution of these motivations in the current-affairs and the talk shows.Four dominant motivations chiefly register, but also personal name, principle of economy, and specificity are attributed to more than 95% of the English segments. This trend is the same across genres (current-affairs and talk shows), genders (see Table 6), and age groups (see Table 5). All other categories, including quotations, euphemism, doubling, and interjection, are comparatively infrequent. Genres. Among the four dominant motivations, register the use of appropriately informal words is the most frequent motivation in both the current-affairs and 167 talk shows.Its proportion, however, is significantly more marked (47. 4%) in the talk show than in current affairs (36. 4%), reflecting the more informal nature of the former. Table 4 Distribution of code-switching motivations, contrasted between genres. Motivation Current affairs Talk show Register 36. 4% 47. 4% Personal Name 26. 8% 24. 5% Principle of economy 19. 0% 17. 6% Specificity 13. 2% 8. 2% Quotation 2. 1% 1. 0% Doubling 1. 4% 0. 4% intervention 0. 9% 1. 0% Euphemism 0. 3% 0% Age groups. Table 5 contrasts the distributions of code-switching motivations between adults and teenagers in the current-affairs shows 3 .As mentioned above, the four major motivations remain constant. However, teenagers are much more likely than adults to use English words to give more informal register (52. 4% vs. 35. 1%). They also tend more to opt for English to save effort (23. 8% vs. 18. 6%). Somewhat surprisingly at first glance, teenagers consultation others in English names less often than adults (2. 4% vs. 28. 8%) it turns out that in the conversations in our corpus, teenagers often prefer to address adults with the more formal Chinese names, likely out of respect.Table 5 Distribution of code-switching motivations, contrasted between age groups. Motivation Adults Teenagers Register 35. 1% 52. 4% Personal Name 28. 8% 2. 4% Principle of economy 18. 6% 23. 8% Specificity 13. 1% 14. 3% Quotation 1. 9% 4. 0% Doubling 1. 3% 2. 4% Interjection 0. 9% 0% Euphemism 0. 3% 0. 8% use English names to address others (32. 9% vs. 18. 9%) men, on the other hand, more frequently use English words to reduce effort (22. 9% vs. 14. 8%). V. CONCLUSIONS We have described the construction of a corpus of Cantonese-English mixed code, based on speech transcribed from television programs in Hong Kong.Drawn from more than 60 hours of speech, this corpus is among the largest of its type. A novel feature of the corpus is the annotation of the motivation behind each code-mixed utterance. Having proposed a classification system for these motivations, we applied it on our corpus, and describe differences in the use of mixed code between genres, genders and age groups. A key finding is that four main motivations register, personal name, principle of economy, and specificity account for more than 95% o f the embedded English segments.ACKNOWLEDGMENT This project was partially funded by a Small-Scale Research make from the Department of Chinese, Translation and Linguistics at City University of Hong Kong. We thank piece of music Chong Mak and Hiu Yan Wong for compiling the corpus and performing annotation. REFERENCES 1 K. H. Y. Chen, The Social Distinctiveness of Two Code-mixing Styles in Hong Kong, in Proceedings of the 4th International Symposium on Bilingualism, MA Cascadilla Press, 2005, pp. 527541. J. Gumperz, The sociolinguistic significance of conversational code-switching, in RELC Journal 8(2), 1977, pp. 134. J.Gibbons, Code-mixing and koineizing in the speech of students at the university of Hong Kong, in Anthropological Linguistics 21(3), 1979, pp. 113123. B. H. -S. Chan, How does Cantonese-English code-mixing work? , in Language in Hong Kong at Centurys End, M. C. Pennington (ed. ), 1998, pp. 191216, Hong Kong Hong Kong University Press. D. C. S. Li, Linguistic converge nce jar of English on Hong Kong Cantonese, in Asian Englishes 2(1), 1999, pp. 536. K. K. Luke, Why two languages might be better than one motivations of language mixing in Hong Kong, in Language in Hong Kong at Centurys End, M.C. Pennington (ed. ), 1998, pp. one hundred forty-five159, Hong Kong Hong Kong University Press. D. C. S. Li, Cantonese-English code-switching research in Hong Kong a Y2K review, in World Englishes 19(3), 2000, pp. 305 322. H. Cao, Development of a Cantonese-English code-mixing speech recognition system, PhD dissertation, Chinese University of Hong Kong, 2011. R. Appel and P. Muysken, Language contact and bilingualism. London Arnold, 1987. 2 3 4 5 6 Table 6 Distribution of code-switching motivations, contrasted between genders.Motivation Female Male Register 37. 5% 40. 7% Personal Name 32. 9% 18. 9% Principle of economy 14. 8% 22. 9% Specificity 10. 9% 13. 2% Quotation 1. 9% 1. 7% Doubling 1. 1% 1. 3% Interjection 0. 7% 1. 1% Euphemism 0. 3% 0. 2% Genders. Fi nally, we investigate whether codeswitching motivations are biased according to gender. Aggregating statistics from both the current-affairs and talk shows, Table 6 compares the motivations of males and those of females. Females are shown to be more likely to 3 7 8 9 The speakers in the talk show are predominantly adults. 168
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.