About StringNet 4.0 and StringNet Navigator

What is StringNet?

StringNet is an English lexico-grammatical knowledgebase consisting of multiword patterns of word behavior. These are represented by what we call hybrid n-grams and their relations to each other. Currently, StringNet contains about two billion hybrid n-grams extracted from the British National Corpus (BNC), each hybrid n-gram linked to all tokens attested in BNC. The design and motivation of (an earlier version of) StringNet are described in Wible and Tsao (2010).

What are Hybrid n-grams?

The multiword patterns that we call hybrid n-grams are sequences of grams which may consist of (1) specific word forms (e.g., ‘trying’ but not ‘tried’ or ‘tries’ or ‘try’); (2) lexemes (e.g., try, including its various forms—trying, tried, etc.); or (3) parts of speech (POSs), marked off in brackets, including specific POSs such as [V-ing] or more general POSs such as [verb], which cover different more specific POSs such as [V-ing]. An example hybrid n-gram is:


there be no point in [v-ing]

Important: Click on this hybrid n-gram anywhere and wait a bit. A pop-up shows all the forms attested in that slot. Try it above.

What about the latest version, StringNet 4.0?

Just as all previous versions, StringNet, 4.0 takes an English word (or words) as a query and responds with a ranked list of multiword and lexico-grammatical patterns in which that word is conventionally used (or in which those words conventionally co-occur) and concordances for each pattern. As a ‘net’, StringNet 4.0 still links each pattern to its related patterns, to its more abstract counterparts (its parents) and more specific counterparts (its children). So it links ‘consider yourself lucky’ to its parents ‘consider [pron reflx] lucky’, ‘[verb] yourself lucky’ and ‘consider yourself [adj]’, for example. Click on the following links beside any pattern to find these related patterns: parents; children, expand, contract. Also, clicking on any word or slot in any pattern displays its paradigm, a list of the substitutable words there representing the attested variation for that slot in that exact context.

New function in 4.0: Collocations
StringNet 4.0 has added a collocation search that takes a query word and provides two word collocations containing it (for a query of resemblance, it gives striking resemblance; passing resemblance, etc.). Additionally, each collocation is linked to contextual patterns that contain that collocation: ‘bear a striking resemblance to’, ‘more than a passing resemblance to’ for example. Be sure to check Search Options for Collocation searches. There you can specify the part of speech of the target word and of the collocates to be found and to indicate their linear ordering (find collocates preceding the target word or following it).

Other new features
4.0 gives more flexibility for users to decide (1) how patterns are ranked; for example, there is now an option to rank results by pure frequency; and (2) what sorts of patterns to display; for example, users can ask to be shown only patterns with words and no POS slots or only patterns of one length (only 4-grams or only 3-grams, for example).

What is StringNet Navigator?

It is the user-interface for querying and navigating StringNet (http://nav.stringnet.org). It takes queries of one or more words submitted to its query box and provides a list of patterns in which the query word is conventionally used (or, in the case of multi-word queries, patterns in which the query words conventionally co-occur). For example, a query of ‘take’ yields: ‘take place [prep]’, ‘take part in’, ‘take advantage of’, and many others. Each hybrid n-gram listed in search results is accompanied by a variety of related links and information. And that is what makes StringNet a net.

The Navigable Links among Patterns that Make StringNet a Net (New)

The two figures below illustrate the links available between and among patterns that show up in the search results.


[Enlarge]


[Enlarge]


Each of the 2 billion patterns in StringNet is indexed (linked) to other related patterns by four basic types of relations.

1.Parents: more abstract versions of itself
2.Children: more specific versions of itself
3.Expand: to longer versions of itself
4.Contract: to shorter versions of itself

For example, for a query of the word ‘step’, the first pattern listed in the results is “step by step” and the second is this: “take the unprecedented step of [v-ing]”

Here are examples of patterns that are related in the four ways to the hybrid n-gram “take the unprecedented step of [v-ing]
    Some parents of it: “[verb] the unprecedented step of [v-ing]”
“take the [adj] step of [v-ing]”
    A child of it: “took the unprecedented step of [v-ing]”
    Contracted (shorter version): “take the step of [v-ing]”
    Expanded (longer version): “[noun] take the unprecedented step of [v-ing]”


Who are we?

David Wible is Distinguished Professor of Learning and Instruction at National Central University (NCU) in Taiwan and Dean of the College of Liberal Arts.
Email: wible@stringnet.org

Nai-Lung Tsao is a research assistant at the Graduate Institute of Learning and Instruction at National Central University in Taiwan.
Email: beaktsao@stringnet.org

We have developed StringNet as one of a suite of forthcoming tools for various aspects of second language vocabulary learning, teaching, and materials development. Our ongoing research and development has been supported by grants from Taiwan's National Science Council.


References

  1. Nai-Lung Tsao, and David Wible. "Word similarity using constructions as contextual features", The Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora, Trento, Italy , November, 2013. [pdf]
  2. David Wible and Nai-Lung Tsao. "StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions", The NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, LA, June 1-June6, 2010. [pdf]
  3. Nai-Lung Tsao and David Wible. "A Method for Unsupervised Broad-Coverage Lexical Error Detection and Correction", The NAACL HLT Workshop on Innovative Use of NLP for Building Educational Applications, Boulder, Colorado, May 31-June5, 2009. [pdf]


A Bit of History: All Versions of StringNet
  1. LexChecker (StringNet 1.0): November 1, 2009
  2. StringNet 2.0: February 1, 2011
  3. StringNet 3.0: May 16, 2012

Join our mailing list

Email: