PT | EN | ES

Main Menu


Powered by TEITOK
© Maarten Janssen, 2014

Downloads

For the benefit of researchers who want to deal with our data using external tools, we offer them below in a text format.

Table 1: Corpus distribution by language, century and format:

LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus(1) ISLRN: 662-489-499-707-7
Portuguese XVI PT1500_ORIG_TXT.ZIP PT1500_MOD_TXT.ZIP PT1500_POS.ZIP PT1500_PSD.ZIP
Portuguese XVII PT1600_ORIG_TXT.ZIP PT1600_MOD_TXT.ZIP PT1600_POS.ZIP PT1600_PSD.ZIP
Portuguese XVIII PT1700_ORIG_TXT.ZIP PT1700_MOD_TXT.ZIP PT1700_POS.ZIP PT1700_PSD.ZIP
Portuguese XIX PT1800_ORIG_TXT.ZIP PT1800_MOD_TXT.ZIP PT1800_POS.ZIP PT1800_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation(2) ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500_ORIG_TXT.ZIP ES1500_MOD_TXT.ZIP ES1500_POS.ZIP ES1500_PSD.ZIP
Spanish XVII ES1600_ORIG_TXT.ZIP ES1600_MOD_TXT.ZIP ES1600_POS.ZIP ES1600_PSD.ZIP
Spanish XVIII ES1700_ORIG_TXT.ZIP ES1700_MOD_TXT.ZIP ES1700_POS.ZIP ES1700_PSD.ZIP
Spanish XIX ES1800_ORIG_TXT.ZIP ES1800_MOD_TXT.ZIP ES1800_POS.ZIP ES1800_PSD.ZIP

(1) PSD files are searchable with CorpusSearch. PSDX files are stored in TEITOK and are searchable online.
(2) The revision of the Parts Of Speech annotation of the Spanish corpus is still in progress.

Table 2: Corpus distribution by gender, language, century and format:

Masculine
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500m_ORIG_TXT.ZIP PT1500m_MOD_TXT.ZIP PT1500m_POS.ZIP PT1500m_PSD.ZIP
Portuguese XVII PT1600m_ORIG_TXT.ZIP PT1600m_MOD_TXT.ZIP PT1600m_POS.ZIP PT1600m_PSD.ZIP
Portuguese XVIII PT1700m_ORIG_TXT.ZIP PT1700m_MOD_TXT.ZIP PT1700m_POS.ZIP PT1700m_PSD.ZIP
Portuguese XIX PT1800m_ORIG_TXT.ZIP PT1800m_MOD_TXT.ZIP PT1800m_POS.ZIP PT1800m_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN:
Spanish XVI ES1500m_ORIG_TXT.ZIP ES1500m_MOD_TXT.ZIP ES1500m_POS.ZIP ES1500m_PSD.ZIP
Spanish XVII ES1600m_ORIG_TXT.ZIP ES1600m_MOD_TXT.ZIP ES1600m_POS.ZIP ES1600m_PSD.ZIP
Spanish XVIII ES1700m_ORIG_TXT.ZIP ES1700m_MOD_TXT.ZIP ES1700m_POS.ZIP ES1700m_PSD.ZIP
Spanish XIX ES1800m_ORIG_TXT.ZIP ES1800m_MOD_TXT.ZIP ES1800m_POS.ZIP ES1800m_PSD.ZIP
Feminine
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500f_ORIG_TXT.ZIP PT1500f_MOD_TXT.ZIP PT1500f_POS.ZIP PT1500f_PSD.ZIP
Portuguese XVII PT1600f_ORIG_TXT.ZIP PT1600f_MOD_TXT.ZIP PT1600f_POS.ZIP PT1600f_PSD.ZIP
Portuguese XVIII PT1700f_ORIG_TXT.ZIP PT1700f_MOD_TXT.ZIP PT1700f_POS.ZIP PT1700f_PSD.ZIP
Portuguese XIX PT1800f_ORIG_TXT.ZIP PT1800f_MOD_TXT.ZIP PT1800f_POS.ZIP PT1800f_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500f_ORIG_TXT.ZIP ES1500f_MOD_TXT.ZIP ES1500f_POS.ZIP ES1500f_PSD.ZIP
Spanish XVII ES1600f_ORIG_TXT.ZIP ES1600f_MOD_TXT.ZIP ES1600f_POS.ZIP ES1600f_PSD.ZIP
Spanish XVIII ES1700f_ORIG_TXT.ZIP ES1700f_MOD_TXT.ZIP ES1700f_POS.ZIP ES1700f_PSD.ZIP
Spanish XIX ES1800f_ORIG_TXT.ZIP ES1800f_MOD_TXT.ZIP ES1800f_POS.ZIP ES1800f_PSD.ZIP

Table 3: Corpus distribution by social status3, language, century and format:

Nobility
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500nob_ORIG_TXT.ZIP PT1500nob_MOD_TXT.ZIP PT1500nob_POS.ZIP PT1500nob_PSD.ZIP
Portuguese XVII PT1600nob_ORIG_TXT.ZIP PT1600nob_MOD_TXT.ZIP PT1600nob_POS.ZIP PT1600nob_PSD.ZIP
Portuguese XVIII PT1700nob_ORIG_TXT.ZIP PT1700nob_MOD_TXT.ZIP PT1700nob_POS.ZIP PT1700nob_PSD.ZIP
Portuguese XIX PT1800nob_ORIG_TXT.ZIP PT1800nob_MOD_TXT.ZIP PT1800nob_POS.ZIP Still no data
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500nob_ORIG_TXT.ZIP ES1500nob_MOD_TXT.ZIP ES1500nob_POS.ZIP ES1500nob_PSD.ZIP
Spanish XVII ES1600nob_ORIG_TXT.ZIP ES1600nob_MOD_TXT.ZIP ES1600nob_POS.ZIP ES1600nob_PSD.ZIP
Spanish XVIII ES1700nob_ORIG_TXT.ZIP ES1700nob_MOD_TXT.ZIP ES1700nob_POS.ZIP ES1700nob_PSD.ZIP
Spanish XIX ES1800nob_ORIG_TXT.ZIP ES1800nob_MOD_TXT.ZIP ES1800nob_POS.ZIP ES1800nob_PSD.ZIP
Church
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500ecc_ORIG_TXT.ZIP PT1500ecc_MOD_TXT.ZIP PT1500ecc_POS.ZIP PT1500ecc_PSD.ZIP
Portuguese XVII PT1600ecc_ORIG_TXT.ZIP PT1600ecc_MOD_TXT.ZIP PT1600ecc_POS.ZIP PT1600ecc_PSD.ZIP
Portuguese XVIII PT1700ecc_ORIG_TXT.ZIP PT1700ecc_MOD_TXT.ZIP PT1700ecc_POS.ZIP PT1700ecc_PSD.ZIP
Portuguese XIX PT1800ecc_ORIG_TXT.ZIP PT1800ecc_MOD_TXT.ZIP PT1800ecc_POS.ZIP PT1800ecc_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500ecc_ORIG_TXT.ZIP ES1500ecc_MOD_TXT.ZIP ES1500ecc_POS.ZIP ES1500ecc_PSD.ZIP
Spanish XVII ES1600ecc_ORIG_TXT.ZIP ES1600ecc_MOD_TXT.ZIP ES1600ecc_POS.ZIP ES1600ecc_PSD.ZIP
Spanish XVIII ES1700ecc_ORIG_TXT.ZIP ES1700ecc_MOD_TXT.ZIP ES1700ecc_POS.ZIP ES1700ecc_PSD.ZIP
Spanish XIX ES1800ecc_ORIG_TXT.ZIP ES1800ecc_MOD_TXT.ZIP ES1800ecc_POS.ZIP ES1800ecc_PSD.ZIP
Inquisition
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500inq_ORIG_TXT.ZIP PT1500inq_MOD_TXT.ZIP PT1500inq_POS.ZIP PT1500inq_PSD.ZIP
Portuguese XVII PT1600inq_ORIG_TXT.ZIP PT1600inq_MOD_TXT.ZIP PT1600inq_POS.ZIP PT1600inq_PSD.ZIP
Portuguese XVIII PT1700inq_ORIG_TXT.ZIP PT1700inq_MOD_TXT.ZIP PT1700inq_POS.ZIP PT1700inq_PSD.ZIP
Portuguese XIX Still no data Still no data Still no data Still no data
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN:
Spanish XVI ES1500inq_ORIG_TXT.ZIP ES1500inq_MOD_TXT.ZIP ES1500inq_POS.ZIP Still no data
Spanish XVII ES1600inq_ORIG_TXT.ZIP ES1600inq_MOD_TXT.ZIP ES1600inq_POS.ZIP Still no data
Spanish XVIII ES1700inq_ORIG_TXT.ZIP ES1700inq_MOD_TXT.ZIP ES1700inq_POS.ZIP Still no data
Spanish XIX Still no data Still no data Still no data Still no data
Military
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500mil_ORIG_TXT.ZIP PT1500mil_MOD_TXT.ZIP PT1500mil_POS.ZIP PT1500mil_PSD.ZIP
Portuguese XVII PT1600mil_ORIG_TXT.ZIP PT1600mil_MOD_TXT.ZIP PT1600mil_POS.ZIP PT1600mil_PSD.ZIP
Portuguese XVIII PT1700mil_ORIG_TXT.ZIP PT1700mil_MOD_TXT.ZIP PT1700mil_POS.ZIP PT1700mil_PSD.ZIP
Portuguese XIX PT1800mil_ORIG_TXT.ZIP PT1800mil_MOD_TXT.ZIP PT1800mil_POS.ZIP PT1800mil_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500mil_ORIG_TXT.ZIP ES1500mil_MOD_TXT.ZIP ES1500mil_POS.ZIP ES1500mil_PSD.ZIP
Spanish XVII ES1600mil_ORIG_TXT.ZIP ES1600mil_MOD_TXT.ZIP ES1600mil_POS.ZIP Still no data
Spanish XVIII ES1700mil_ORIG_TXT.ZIP ES1700mil_MOD_TXT.ZIP ES1700mil_POS.ZIP ES1700mil_PSD.ZIP
Spanish XIX ES1800mil_ORIG_TXT.ZIP ES1800mil_MOD_TXT.ZIP ES1800mil_POS.ZIP ES1800mil_PSD.ZIP
Knightly orders
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500kni_ORIG_TXT.ZIP PT1500kni_MOD_TXT.ZIP PT1500kni_POS.ZIP PT1500kni_PSD.ZIP
Portuguese XVII PT1600kni_ORIG_TXT.ZIP PT1600kni_MOD_TXT.ZIP PT1600kni_POS.ZIP PT1600kni_PSD.ZIP
Portuguese XVIII PT1700kni_ORIG_TXT.ZIP PT1700kni_MOD_TXT.ZIP PT1700kni_POS.ZIP Still no data
Portuguese XIX PT1800kni_ORIG_TXT.ZIP PT1800kni_MOD_TXT.ZIP PT1800kni_POS.ZIP PT1800kni_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI Still no data Still no data Still no data Still no data
Spanish XVII ES1600kni_ORIG_TXT.ZIP ES1600kni_MOD_TXT.ZIP ES1600kni_POS.ZIP ES1600kni_PSD.ZIP
Spanish XVIII ES1700kni_ORIG_TXT.ZIP ES1700kni_MOD_TXT.ZIP ES1700kni_POS.ZIP Still no data
Spanish XIX Still no data Still no data Still no data Still no data
Universitary
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500uni_ORIG_TXT.ZIP PT1500uni_MOD_TXT.ZIP PT1500uni_POS.ZIP PT1500uni_PSD.ZIP
Portuguese XVII PT1600uni_ORIG_TXT.ZIP PT1600uni_MOD_TXT.ZIP PT1600uni_POS.ZIP Still no data
Portuguese XVIII PT1700uni_ORIG_TXT.ZIP PT1700uni_MOD_TXT.ZIP PT1700uni_POS.ZIP Still no data
Portuguese XIX PT1800uni_ORIG_TXT.ZIP PT1800uni_MOD_TXT.ZIP PT1800uni_POS.ZIP Still no data
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500uni_ORIG_TXT.ZIP ES1500uni_MOD_TXT.ZIP ES1500uni_POS.ZIP ES1500uni_PSD.ZIP
Spanish XVII ES1600uni_ORIG_TXT.ZIP ES1600uni_MOD_TXT.ZIP ES1600uni_POS.ZIP ES1600uni_PSD.ZIP
Spanish XVIII ES1700uni_ORIG_TXT.ZIP ES1700uni_MOD_TXT.ZIP ES1700uni_POS.ZIP ES1700uni_PSD.ZIP
Spanish XIX ES1800uni_ORIG_TXT.ZIP ES1800uni_MOD_TXT.ZIP ES1800uni_POS.ZIP Still no data
Ordinary status
LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500ord_ORIG_TXT.ZIP PT1500ord_MOD_TXT.ZIP PT1500ord_POS.ZIP PT1500ord_PSD.ZIP
Portuguese XVII PT1600ord_ORIG_TXT.ZIP PT1600ord_MOD_TXT.ZIP PT1600ord_POS.ZIP PT1600ord_PSD.ZIP
Portuguese XVIII PT1700ord_ORIG_TXT.ZIP PT1700ord_MOD_TXT.ZIP PT1700ord_POS.ZIP PT1700ord_PSD.ZIP
Portuguese XIX PT1800ord_ORIG_TXT.ZIP PT1800ord_MOD_TXT.ZIP PT1800ord_POS.ZIP PT1800ord_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500ord_ORIG_TXT.ZIP ES1500ord_MOD_TXT.ZIP ES1500ord_POS.ZIP ES1500ord_PSD.ZIP
Spanish XVII ES1600ord_ORIG_TXT.ZIP ES1600ord_MOD_TXT.ZIP ES1600ord_POS.ZIP ES1600ord_PSD.ZIP
Spanish XVIII ES1700ord_ORIG_TXT.ZIP ES1700ord_MOD_TXT.ZIP ES1700ord_POS.ZIP ES1700ord_PSD.ZIP
Spanish XIX ES1800ord_ORIG_TXT.ZIP ES1800ord_MOD_TXT.ZIP ES1800ord_POS.ZIP ES1800ord_PSD.ZIP

(3) Apart from the seven social status types included in Table 3, there is an eighth type, slaves, for which there is only one woman author (Teresa de Jesus Faria), with only one letter (PSCR0620).

Table 4: Balanced corpus (one letter per author) distributed by language, century and format:

LanguageCenturyOriginal text ISLRN: 375-405-009-147-2Standardized text ISLRN: 375-405-009-147-2POS annotation ISLRN: 321-583-358-829-1Parsed corpus ISLRN: 662-489-499-707-7
Portuguese XVI PT1500bal_ORIG_TXT.ZIP PT1500bal_MOD_TXT.ZIP PT1500bal_POS.ZIP PT1500bal_PSD.ZIP
Portuguese XVII PT1600bal_ORIG_TXT.ZIP PT1600bal_MOD_TXT.ZIP PT1600bal_POS.ZIP PT1600bal_PSD.ZIP
Portuguese XVIII PT1700bal_ORIG_TXT.ZIP PT1700bal_MOD_TXT.ZIP PT1700bal_POS.ZIP PT1700bal_PSD.ZIP
Portuguese XIX PT1800bal_ORIG_TXT.ZIP PT1800bal_MOD_TXT.ZIP PT1800bal_POS.ZIP PT1800bal_PSD.ZIP
LanguageCenturyOriginal text ISLRN: 305-406-112-712-3Standardized text ISLRN: 305-406-112-712-3POS annotation ISLRN: 042-997-465-008-9Parsed corpus ISLRN: 306-113-341-591-4
Spanish XVI ES1500bal_ORIG_TXT.ZIP ES1500bal_MOD_TXT.ZIP ES1500bal_POS.ZIP ES1500bal_PSD.ZIP
Spanish XVII ES1600bal_ORIG_TXT.ZIP ES1600bal_MOD_TXT.ZIP ES1600bal_POS.ZIP ES1600bal_PSD.ZIP
Spanish XVIII ES1700bal_ORIG_TXT.ZIP ES1700bal_MOD_TXT.ZIP ES1700bal_POS.ZIP ES1700bal_PSD.ZIP
Spanish XIX ES1800bal_ORIG_TXT.ZIP ES1800bal_MOD_TXT.ZIP ES1800bal_POS.ZIP ES1800bal_PSD.ZIP

Table 5:  Documents in XML-TEI P5 can be downloaded from below:

LanguageCenturyXML-TEI: P5. Whole corpusXML-TEI: P5. Balanced corpus(4)
Portuguese XVI PT1500_XML-TEI_P5.ZIP PT1500bal_XML-TEI_P5.ZIP
Portuguese XVII PT1600_XML-TEI_P5.ZIP PT1600bal_XML-TEI_P5.ZIP
Portuguese XVIII PT1700_XML-TEI_P5.ZIP PT1700bal_XML-TEI_P5.ZIP
Portuguese XIX PT1800_XML-TEI_P5.ZIP PT1800bal_XML-TEI_P5.ZIP
LanguageCenturyXML-TEI: P5. Whole corpusXML-TEI: P5. Balanced corpus
Spanish XVI ES1500_XML-TEI_P5.ZIP ES1500bal_XML-TEI_P5.ZIP
Spanish XVII ES1600_XML-TEI_P5.ZIP ES1600bal_XML-TEI_P5.ZIP
Spanish XVIII ES1700_XML-TEI_P5.ZIP ES1700bal_XML-TEI_P5.ZIP
Spanish XIX ES1800_XML-TEI_P5.ZIP ES1800bal_XML-TEI_P5.ZIP

(4) The balanced corpus is a subcorpus created from the automatic selection of one letter per author, usually the one that shows more different types of words in its standardized edition (cf. section 1.3.1.3. in Manual de Edición y Anotación en TEITOK de los Materiales de P.S. Post Scriptum).

SPELLING VARIANTS

DICER statistics on a standardized version of 478 Portuguese Letters: Portuguese Post Scriptum by eDictor. As can be seen by its name, the process of manual standardization was carried out by means of the eDictor tool.