Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Languages
Informations
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Contact Us
Catalog Reference : ELRA-W0043
PAROLE Italian Corpus
The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains:
• newspapers: 2,179,800 words from La Stampa, La Repubblica, Il Corriere della Sera, L’Unione Sarda, Il Sole 24ore, between 1992 and 1996,
• periodicals: 143,810 words from Casaviva, 100cose, Epoca, Espansione, Grazia, Panorama, Starbene, Storia Illustrata, Zerouno, between 1985 and 1988,
• books: 564,964 words, between 1970 and 1989,
• miscellaneous: 247,077 words from CNR documents, Patents, Maritime documents, Theater, between 1987 and 1997.
About 250,000 words were morphosyntactically annotated and lemmatized.
Identification
Period of coverage :
Version :
Version history :
Update frequency: every 3 years Last update: 2004
Production
Project :
PAROLE
Creation date :
1996 -1998
Applications
Applications existing :
Information retrieval
Technical Information
Platform :
PC
Fileformat :
Plain text
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Italian
Character set :
ILC tagset and Parole tagset (EAGLES conformant)
Number of tokens :
3,135,651 words
TEXT_ANNOTATION_SCHEMETEI
TEXT_ANNOTATION_LANGUAGESGML
Members Prices
Academic - Research 100.00 EUR
Commercial - Research 100.00 EUR
Non Member Prices
Academic - Research 150.00 EUR
Commercial - Research 150.00 EUR
Tuesday 21 May, 2013
10482250 requests since Monday 27 September, 2004
Copyright © 2008
ELRA
ELRACatalogue 0.8.0