About 懂中文 Dong Chinese
Developed by Peter Olson.
Blog: 东东's notes
Where do the sentences come from?
Dong Chinese uses a database of 705,493 sentences. The sentences come from several different sources:
Tatoeba (17,355 sentences)
UM-Corpus (29,446 sentences)
Education (13,080 sentences)
Microblog (156 sentences)
News (6,055 sentences)
Science (1,341 sentences)
Spoken (8,054 sentences)
Subtitles (760 sentences)
AI Challenger caption dataset (210,000 images with 565,231 captions)
AI Challenger translation dataset (91,220 sentences)
Programmatically generated small-vocabulary sentences (2,241 sentences)
How is the percentage of movies and books I understand estimated?
Dong Chinese uses the following data:
What technologies are used?
Dong Chinese was built with the help of the following libraries, frameworks, and services:
The following open-source libraries were created while developing Dong Chinese: