
Biography
Yuqi Chen is a research assistant focusing the CQH Team’s research initiatives on digitial humanities.
Related Publication and Projects

Extracting geographic information from historical texts presents unique challenges. To address these challenges, this study leverages generative large language models (LLMs) to extract historical toponyms and their corresponding location references from texts. The coordinates of the extracted toponyms are then identified by a historical geocoder, which also calculates their maximum error distances based on the location references, indicating the degree of uncertainty. Both the extraction and geocoding processes are integrated into a novel tool named ‘His-Geo’ (https://github.com/yukiyuqichen/His-Geo). To evaluate the results, this study also curates a manually annotated dataset, the Early China Historical Geographic Corpus (CHGC-Early), filling the gap in the absence of geographic data for early China in existing gazetteers and providing a benchmark dataset for training and evaluating approaches for tasks related to geographic information extraction from premodern Chinese texts. The evaluation results show a satisfactory 0.831 F1 score for the GPT-4o model, demonstrating the remarkable capability of generative large language models in extracting geographic information from lengthy, unstructured texts that encompass diverse and sometimes conflicting views.