NooJ is a linguistic development environment software as well as a corpus processor constructed by Max Silberztein. NooJ allows linguists to construct the four classes of the Chomsky-Schützenberger hierarchy of generative grammars: Finite-State Grammars, Context-Free Grammars, Context-Sensitive Grammars as well as Unrestricted Grammars, using either a text editor (e.g. to write down regular expressions), or a Graph editor.
NooJ allows linguists to develop orthographical and morphological grammars, dictionaries of simple words, of compound words as well as discontinuous expressions, local syntactic grammars (such as Named Entities Recognizers), structural syntactic grammars (that produce syntactic trees) as well as Zellig Harris‘ transformational grammars.
All NooJ parsers process Atomic Linguistic Units (ALUs), as opposed to word forms (i.e. sequences of letters between two space characters). This allows NooJ’s syntactic parser to parse sequences of word forms such as “can not” exactly as contracted word forms such as “cannot” or “can’t”. This allows linguists to write relatively simple syntactic grammars, even for agglutinative languages. ALUs are represented by annotations that are stored in the Text Annotation Structure (or TAS): all NooJ parsers add, or remove annotations in the TAS. A typical NooJ analysis involves applying to a text a series of elementary grammars in cascade, in a bottom-up approach (from spelling to semantics).