:::

中央研究院 Logo

字級

搜尋

關於中研院
- 簡介 +
  
  簡史及組織架構
  
  院長及副院長
  
  院士專區
  
  評議會
  
  院務會議
- 南港院區 +
  
  交通資訊
  
  公共設施
  
  展覽館群
  
  餐飲資訊
  
  住宿資訊
- 南部院區 +
- 國家生技研究園區 +
簡介
研究單位
行政單位
- 秘書⾧及副秘書⾧ +
- 學術諮詢總會 +
- 秘書處 +
- 學術及儀器事務處 +
- 總務處 +
- 智財技轉處 +
- 資訊服務處 +
- 國際事務處 +
- 法制處 +
- 南院服務處 +
- 主計室 +
- 人事室 +
- 政風室 +
學術研究
- 近期研究成果 +
- 研究出版 +
- 政策建議書 +
- 計畫與獎項申請 +
- 學術及研究倫理 +
- 學術殊榮 +
- 國際合作 +
- 學術資源 +
- 核心設施 +
- 圖書館及檔案館 +
  
  圖書館
  
  歷史語言研究所檔案館
  
  近代史研究所檔案館
  
  臺灣史研究所檔案館
計畫與獎項
徵才育才
- 研究 +
- 行政 +
- 學生研習與學程 +
- 人才招募網 +
人才招募網
新聞與活動
- 新聞訊息 +
- 學術行政 +
- 近期活動 +
- 科普推廣 +
- 研之有物 +
- 中研院訊 +
- 網路相簿 +
- 影音專區 +
- 聲明專區 +
中研院訊
公開資訊
國際學者資訊

近期活動

:::

演講或講座
語言學研究所

地點
中央研究院語言學研究所519會議室
演講人姓名
Joshua K. Hartshorne 助教授 (美國波士頓學院)
活動狀態
確定
活動網址
https://www.ling.sinica.edu.tw/main/zh-tw?/sign/1214

Towards a computational and experimental linguistics of Formosan

2023-12-14 10:00 - 12:00

The recently-published Handbook of Formosan Linguistics clearly shows the impressive progress that has been made in terms of field-work and linguistic analyses. At the same time, it also highlights just how little has been done in terms of quantitative work, with almost no computational, psycholinguistic, or language acquisition studies. Because quantitative studies have been important drivers of linguistic theory, the lack of quantitative studies is not just a problem for the quantitative language sciences but for linguistic theory as a hole. Critically, because all Formosan languages are endangered, the window of opportunity for experimental work is rapidly closing.

Currently, the main obstacle to quantiative studies of Formosan is the lack of machine-readable corpora. It is obvious that corpora are the basis of essentially all computational linguistics. It may be less immediately obvious that they are a prerequisite for psycholinguistic and language acquisition studies as well. However, most experimental designs require knowledge of word frequencies, cloze probabilities, and other corpus-based statistics.

In principle, a substantial amount of corpus material has already been compiled for Formosan languages. In practice, converting these to a format that allows statistical analysis is a major undertaking. In this talk, I describe our grant-funded efforts to build FormosanBank, a machine-readable corpus of all 16 extant languages. I describe progress so far as well, short-term plans, and how researchers can contribute. I also present initial results from using machine learning to speed up collecting and processing new corpora.

回上一頁

:::