The increasing availability of massive quantities of digital data is one of the main reasons why an infrastructure project such as CLARIAH CORE is needed. The massive amounts of the data make it impossible to research them in the traditional way. The researcher has to use digital software to aid him/her in finding potentially relevant parts and ignoring irrelevant ones, or to carry out analysis of the data. But using software to search in and analyse massive amounts of digital data actually creates new opportunities for breakthroughs in humanities research, since it can be based on more data than ever before possible, and since it can make use of automatic analysis software that is more reliable in certain search and analysis tasks than humans are or ever can be (though in others humans still beat software).

Data come in many types. The major types are natural language texts, audio-visual data and structured data (databases). All three types are represented in CLARIAH. Though all types occur in all of CLARIAH’s core disciplines, each core discipline has its own dominant data type:

  • Linguistics: natural language texts
  • Social economic history: structured (often quantitative) data
  • Media Studies: audio-visual data

In addition, a discipline-independent work package deals with data that are useful or needed for all humanities disciplines.

icoonpdfBelow are the different descriptions of the use of data in CLARIAH.
The full PDF-document can be found here.