Foreign Accented Australian English
  • Description

    Foreign accent is a distinct attribute of second language (L2) speech production. Foreign-accented speech is characterized by deviations in the target language pronunciation due to the learners' native language (L1) influence (Best, 1995; Best & Tyler, 2007; Flege, 1995), specifically due to differences in and contact between L1 and L2 phonology and phonotactics (i.e. consonant/vowel acoustics & articulation, syllable structure), as well as prosody (i.e. intonation, rhythm, stress, pitch). The degree or strength of a foreign accent varies among L2 learners and depends on a number of factors, such as the age of L2 acquisition (Piske, MacKay, & Flege, 2001), language experience (i.e., length of residence in an L2-speaking country: Bohn & Flege, 1990), amount of use (Flege, Frieda, & Nozawa, 1997), motivation/language learning aptitude (Piske et al., 2001). Automatic speech recognition (ASR) systems are predominantly trained on native speech data. However, while the native speakers have shown flexibility in adaptation to L2 speech, despite variation in L2 speakers’ accents, proficiency, and fluency, automatic recognition of foreign-accented speech degrades considerably in comparison to recognition of native speech (Derwing, Munro, & Carbonaro, 2000). Thus, the key challenges in ASR for accented speech are to ensure fast model adaptation on limited (and often homogenous) speech data and to facilitate recognition of unseen (untrained) accents (He & Zhao, 2002). Standard Australian English (AusE) is a distinct regional variety of the English language (Cox & Palethorpe, 2007). To enable accurate ASR of AusE speech, models need to be specifically trained on AusE speech data, in addition to American English and British English data (Chengalvarayan, 2001). This dataset contains: • Audio data for 226 speakers of Australian English with an Arabic accent (150 males and 76 females). • Demographic information for all the speakers. • Transcription of the audio data. • Lexicon extracted from the data. Comprehensive documentation publicly available. This dataset contains sensitive information. To discuss the data, please contact d.estival@westernsydney.edu.au ORCID - 0000-0002-6178-3825


    • Data publication title Foreign Accented Australian English
    • Description

      Foreign accent is a distinct attribute of second language (L2) speech production. Foreign-accented speech is characterized by deviations in the target language pronunciation due to the learners' native language (L1) influence (Best, 1995; Best & Tyler, 2007; Flege, 1995), specifically due to differences in and contact between L1 and L2 phonology and phonotactics (i.e. consonant/vowel acoustics & articulation, syllable structure), as well as prosody (i.e. intonation, rhythm, stress, pitch). The degree or strength of a foreign accent varies among L2 learners and depends on a number of factors, such as the age of L2 acquisition (Piske, MacKay, & Flege, 2001), language experience (i.e., length of residence in an L2-speaking country: Bohn & Flege, 1990), amount of use (Flege, Frieda, & Nozawa, 1997), motivation/language learning aptitude (Piske et al., 2001). Automatic speech recognition (ASR) systems are predominantly trained on native speech data. However, while the native speakers have shown flexibility in adaptation to L2 speech, despite variation in L2 speakers’ accents, proficiency, and fluency, automatic recognition of foreign-accented speech degrades considerably in comparison to recognition of native speech (Derwing, Munro, & Carbonaro, 2000). Thus, the key challenges in ASR for accented speech are to ensure fast model adaptation on limited (and often homogenous) speech data and to facilitate recognition of unseen (untrained) accents (He & Zhao, 2002). Standard Australian English (AusE) is a distinct regional variety of the English language (Cox & Palethorpe, 2007). To enable accurate ASR of AusE speech, models need to be specifically trained on AusE speech data, in addition to American English and British English data (Chengalvarayan, 2001). This dataset contains: • Audio data for 226 speakers of Australian English with an Arabic accent (150 males and 76 females). • Demographic information for all the speakers. • Transcription of the audio data. • Lexicon extracted from the data. Comprehensive documentation publicly available. This dataset contains sensitive information. To discuss the data, please contact d.estival@westernsydney.edu.au ORCID - 0000-0002-6178-3825


    • Data type dataset
    • Keywords
      • Accented Australian English (Arabic accent),
      • Language
      • audio recordings with transcriptions
      • Phonemic lexicon
      • The MARCS Institute
    • Funding source
      • Defence Science and Technology Group (DSTG)
    • Grant number(s)
      • - 7989
    • FoR codes
      • 460302 - Audio processing
      • 461199 - Machine learning not elsewhere classified
      • 520405 - Psycholinguistics (incl. speech production and comprehension)
      SEO codes
      • 140104 - Emerging defence technologies
      • 140102 - Command, control and communications
      • 280121 - Expanding knowledge in psychology
      Temporal (time) coverage
    • Start date 2018/12/01
    • End date 2019/03/31
    • Time period
       
      Spatial (location,mapping) coverage
    • Locations
      • Bankstown
      • Liverpool
      Data Locations

      Type Location Notes
      The Data Manager is: Dominique Estival
      Access conditions Conditional
    • Related publications
        Name
      • URL
      • Notes
    • Related website
    • Related metadata (including standards, codebooks, vocabularies, thesauri, ontologies)
    • Related data
        Name
      • URL
      • Notes
    • Related services
        Name
      • URL
      • Notes
      The data will be licensed under
    • Other license
    • Statement of rights in data Copyright Western Sydney University
      Citation Estival, Dominique (2023): Foreign Accented Australian English dataset. Western Sydney University. https://doi.org/10.26183/weve-zg60