{"id":1394,"date":"2018-05-01T16:29:12","date_gmt":"2018-05-01T16:29:12","guid":{"rendered":"http:\/\/www.lancaster.ac.uk\/digging-ecm\/?page_id=1394"},"modified":"2022-12-12T10:56:39","modified_gmt":"2022-12-12T10:56:39","slug":"corpus","status":"publish","type":"page","link":"https:\/\/www.lancaster.ac.uk\/digging-ecm\/corpus\/","title":{"rendered":"Outputs, Corpus & Datasets"},"content":{"rendered":"

<\/div>
<\/div>
<\/div>
<\/div><\/div><\/div>

OUTPUTS OF THE PROJECT<\/strong><\/h2>\n<\/div>
<\/div>
<\/div><\/div>
<\/div>

The project resulted in multiple outputs, some of which, due to their nature, we will continue to develop in the years to come.<\/p>\n<\/div>

<\/div><\/div><\/div>
<\/div>
<\/div>
<\/div>
<\/div><\/div><\/div><\/div><\/div>

DECM Historical Gazetteer<\/strong><\/h2>\n

The historical gazetteer is a digital directory of historical places of Mexico and Guatemala created from information from primary sources and research carried out by the team, as well as information collected from comprehensive studies on the political, religious, and administrative units of the Viceroyalty of New Spain. This contains thousands of place names, alternative names and their location, and a wealth of other geographic historical information in Geographic Information Systems format.<\/p>\n

The research resulted in the\u00a0DECM Online Gazetteer<\/strong><\/a> and a downloadable DECM GIS Dataset<\/a>.<\/strong><\/p>\n

The process followed in creating the gazetteer is explained in ‘The Creation of the DECM Historical Gazetteer<\/strong><\/a>‘ report.<\/p>\n

You can also consult a summary of the development of the DECM Historical Gazetteer in this story map<\/strong><\/a>.<\/p>\n

Please cite this resource as:<\/p>\n

Murrieta-Flores, P., Jim\u00e9nez-Badillo, D., Martins, B., Liceras-Garrido, R., Favila-V\u00e1zquez, M., and Bellamy, K. (2023) \r\nDigging into Early Colonial Mexico Historical Gazetteer. Figshare, Dataset. DOI:10.6084\/m9.figshare.1230168<\/a>2<\/code><\/pre>\n

DECM Corpus<\/strong><\/h2>\n

The DECM Corpus is a digital corpus produced from the original editions of the texts of the Relaciones Geogr\u00e1ficas with different versions. These include a machine-ready version, a machine learning annotated training dataset, and an automatically annotated version ready for text mining and machine learning experiments. All these can be downloaded from the DECM Github Corpus<\/strong><\/a> page.<\/p>\n

The DECM Machine Ready Corpus<\/h3>\n

This version includes text-only files (.txt) containing each of the 10 volumes originally edited by Rene Acu\u00f1a, the 2 volumes edited by Mercedes de la Garza et al., and the Suma de Visita edited by Del Paso y Troncoso, a file with the original text of the Crown mandate (Instrucci\u00f3n), and metadata for this collection. This version contains only the original text of each RGs transcribed by the scholars, excluding any editorial note, commentary, or historical work. This can be therefore used directly for corpus linguistics analyses, visualisations, etc.<\/p>\n

Please cite this resource as:<\/p>\n

Murrieta-Flores, P., Jim\u00e9nez-Badillo, D., Martins, B. (2023) DECM Machine Ready Corpus. \r\nFigshare, Dataset. DOI:10.6084\/m9.figshare.12048729<\/a><\/code><\/pre>\n

The DECM ML-Training corpus<\/h3>\n

This version contains a sample of the RGs manually annotated by multiple researchers with the software of our industry partner, Tagtog. This corpus has been used to carry out the NLP and ML experiments, and the files are available in JSON and TSV format. These files are composed of texts and annotations. This is also accompanied by the DECM ontology, which explains the entities and labels produced. This corpus can be used for further experimentation with Artificial Intelligence methods.<\/p>\n

Please cite this resource as:<\/p>\n

Murrieta-Flores, P., Liceras-Garrido, R., Favila-V\u00e1zquez, M., Jim\u00e9nez-Badillo, D. (2023) DECM ML Training Corpus. \r\nFigshare, Dataset. DOI:10.6084\/m9.figshare.12366734<\/a><\/code><\/pre>\n

The DECM Annotated Corpus<\/h3>\n

This is the version of the entire RG corpus automatically annotated using the ML models trained with the DECM Gold Standard Corpus. The files are available in JSON and TSV format, and it also contains the file for the DECM Ontology. This corpus can be further used for quantitative and qualitative research, as well as advanced analyses using text mining techniques, corpus linguistics and other methods such as Geographical Text Analysis.<\/p>\n

Please cite this resource as:<\/p>\n

Murrieta-Flores, P., Jim\u00e9nez-Badillo, D., Martins, B., Favila-V\u00e1zquez, M., Liceras-Garrido, R.(2023) DECM Annotated Corpus. \r\nFigshare, Dataset. DOI:10.6084\/m9.figshare.12366956<\/a><\/code><\/pre>\n

The DECM Ontology and Annotation Rules<\/h3>\n

This .xls file contains two sheets. The one called ‘Ontology’ defines the entities and labels used to annotate the corpus of the RGs. This comprises 18 entities and labels marking important social, political, territorial, religious, and economic information. The second one, called ‘Annotation rules’ includes the basic rules followed by all the annotators in the project and examples that help to make decisions while carrying out the annotations. These rules were thought to achieve a better annotator consensus which in some cases reached up to 98 per cent in some entities.<\/p>\n

DECM Geographical Text Analysis Software<\/strong><\/h2>\n

The GTA software developed in two beta versions (v.1 and v.2) combine concepts from Corpus Linguistics, Natural Language Processing, and Geographic Information Systems. Our research group first developed the idea at 快播视频 in the context of the Spatial Humanities project (see Murrieta-Flores et al., 2015<\/a>). A detailed description of the method as implemented in the software can be found in Jim\u00e9nez-Badillo et al., 2021<\/a>., Murrieta-Flores et al., 2022<\/a>., and Murrieta-Flores et al., 2023-forthcoming. The method involves applying Geographic Collocation Analysis. The software combines an interface with a corpus viewer, a query interface, and a keyword in-context tool connected to a map explorer and a historical gazetteer. The tool identifies concepts and\/or terms and their associations to places with their coordinates in very large corpora, allowing to explore the corpus in different ways and download the results for further analysis in Geographic Information Systems or other tools.<\/p>\n

In the current version, the software works by uploading an annotated corpus and bringing a historical gazetteer through an API. We are still developing the tool to open it to other users so everyone can work with their own tailored annotated corpora and gazetteers or bring a geographic index from projects such as the World Historical Gazetteer. At the moment, a demo with a sample of the corpus of the Relaciones Geogr\u00e1ficas can be explored here: https:\/\/gta.colonialatlas.com\/v2<\/a><\/strong><\/p>\n

The code for the software can be found on our DECM Github page<\/a>.<\/p>\n

Please contact us if you would like more information and updates on the development of the GTA Software.<\/p>\n

Please cite this resource as:<\/p>\n

Alvarez-Rivera, L., Hern\u00e1ndez-Huerfano, E., Murrieta-Flores, P., Jim\u00e9nez-Badillo, D., and Martins, B.(2023) <\/code>DECM Geographical Text Analysis Software. Figshare, Software. DOI:10.6084\/m9.figshare.21696794<\/a><\/code><\/p>\n

The Relaciones Geogr\u00e1ficas de Nueva Espa\u00f1a Digital Collection<\/strong><\/h2>\n

The Relaciones Geogr\u00e1ficas de la Nueva Espa\u00f1a (1577 – 1585) digital collection<\/a><\/strong> brings together images from the original documents, transcriptions, maps and thematic information from the historical source. This site aims to encourage public interest in these documents and to facilitate the work of historians, ethnologists, archaeologists and linguists interested in the corpus. Each geographical relation is accompanied by a topographical map showing the location of the villages and the geographical features mentioned in the documents. In the case of accounts that include a pictographic map, the map is displayed in high resolution to appreciate all the details easily. In addition, an image of each folio is presented together with the corresponding transcription. Finally, bibliographical references to publications devoted to transcribing, editing or analysing each geographical relation are included.<\/p>\n

Other resources<\/strong><\/h2>\n

Pathways to understanding sixteenth-century Mesoamerica<\/strong><\/a> funded by the Department of History<\/a> at 快播视频, is a spin-off project which created a series of three ESRI StoryMaps, combining interactive texts, images and maps in a series of online interactive learning resources on the history, archaeology and geography of the Postclassic and Colonial period of Central Mexico, beginning in the 14th through to the mid-16th century.<\/p>\n

Subaltern Recogito Dataset and Ontology<\/a><\/strong> were developed as part of the ‘Subaltern Recogito Project’ with a Pelagios Commons Resource Development Grant to explore the annotation of a series of historical maps using Recogito. Our corpus of maps includes those produced in the sixteenth century for the Relaciones Geogr\u00e1ficas de Nueva Espa\u00f1a across the area, which is currently Mexico. The project took place in collaboration with our colleagues in the LLILAS Benson Latin American Studies and Collections at The University of Texas at Austin, the National School of Anthropology and History (ENAH), The National Autonomous University of Mexico (UNAM), the National Institute of Anthropology and History (INAH), and the University of Lisbon. We delivered an online workshop and trained participants on Recogito for the annotation of the sixteenth-century maps of the Relaciones Geogr\u00e1ficas.<\/p>\n

Twenty-seven scholars from UNAM and ENAH participated, and Patricia Murrieta-Flores introduced the Spatial Humanities and the use of these technologies. From this, the project evolved into a citizen science project, where the participants met online every week to participate in \u2018mappathons\u2019, completing the annotation of a set of sixteenth-century maps now available in the Benson collection.<\/p>\n

Please cite this resource as:<\/p>\n

Murrieta-Flores, P., F\u00e1vila-Vazquez, M., Bellamy, K., Jim\u00e9nez-Badillo, D., Martins, B., L\u00f3pez Camacho, J., McDonough, K., and Palacios, Albert A (2019) <\/code>Pelagios Commons: Subaltern Recogito Project. DOI:\u00a0https:\/\/doi.org\/10.18738\/T8\/L2SJQT<\/a>, Texas Data Repository Dataverse, V2.<\/code><\/p>\n

Publications by the project:<\/strong><\/h2>\n