This web site is designed for accessibility. Content is obtainable and functional to any browser or Internet device. This page's full visual experience is available in a graphical browser that supports web standards. See reasons to upgrade your browser.
![]() |
|
Homepage | Research | Publications | Technical Reports | Demos-Downloads | People | Internship | Student Projects | Events | Seminar | Links |
| CRBLP Contact Information Center for Research on Bangla Language Processing
height=1 src="2 Column Demo_files/180px.gif" width=180 border=0>
|
::--Corpus Analysis & Corpus Collection--:: Name:: Shadin Bangla Corpus Summary:: The intuition of this research is to analyze text corpus for regularities and anomalies of Bangla script. Balanced text corpus is one of the parts of corpus analysis where large text of corpus is necessary. So the research team is developing a way to collect Bangla text corpus. Details:: This projects targeted newspaper and some old text corpus such as Ptothom-alo newspaper, Charjapad and Baru Chandi Das Er Kabbo. CRBLP team selected one year corpus of most popular newspaper “Prothom-Alo”. This newspaper corpus covers 32 items of news such as daily news, literature, economics, international, science and so on. Significant amount works involved to analyze this corpus such as text collection from web, Unicode conversion and then analysis. Several analysis criteria are word frequency list, bi-gram, tri-gram analysis, letter frequency and so on. Corpus collection is important to develop Balanced text corpus which will help us in different aspects of linguistic phenomena. Team::
Past team:
Status:: Tool is available for download. [Download] Timeline:: 2006-2008 |
Home | Research | Publications | Technical Reports |
Demos-Downloads |
People | Internship | Student Projects | Events |
Seminar |
Links |
Center for Research on Bangla Language Processing BRAC University, Dhaka, Bangladesh © All Rights Reserved 2008 |