README.md 4.14 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
# Regular Expression Tokenizer

Tokenizes strings that represent a regular expressions.

[![Build Status](https://secure.travis-ci.org/fent/ret.js.svg)](http://travis-ci.org/fent/ret.js)
[![Dependency Status](https://david-dm.org/fent/ret.js.svg)](https://david-dm.org/fent/ret.js)
[![codecov](https://codecov.io/gh/fent/ret.js/branch/master/graph/badge.svg)](https://codecov.io/gh/fent/ret.js)

# Usage

```js
var ret = require('ret');

var tokens = ret(/foo|bar/.source);
```

`tokens` will contain the following object

```js
{
  "type": ret.types.ROOT
  "options": [
    [ { "type": ret.types.CHAR, "value", 102 },
      { "type": ret.types.CHAR, "value", 111 },
      { "type": ret.types.CHAR, "value", 111 } ],
    [ { "type": ret.types.CHAR, "value",  98 },
      { "type": ret.types.CHAR, "value",  97 },
      { "type": ret.types.CHAR, "value", 114 } ]
  ]
}
```

# Token Types

`ret.types` is a collection of the various token types exported by ret.

### ROOT

Only used in the root of the regexp. This is needed due to the posibility of the root containing a pipe `|` character. In that case, the token will have an `options` key that will be an array of arrays of tokens. If not, it will contain a `stack` key that is an array of tokens.

```js
{
  "type": ret.types.ROOT,
  "stack": [token1, token2...],
}
```

```js
{
  "type": ret.types.ROOT,
  "options" [
    [token1, token2...],
    [othertoken1, othertoken2...]
    ...
  ],
}
```

### GROUP

Groups contain tokens that are inside of a parenthesis. If the group begins with `?` followed by another character, it's a special type of group. A ':' tells the group not to be remembered when `exec` is used. '=' means the previous token matches only if followed by this group, and '!' means the previous token matches only if NOT followed.

Like root, it can contain an `options` key instead of `stack` if there is a pipe.

```js
{
  "type": ret.types.GROUP,
  "remember" true,
  "followedBy": false,
  "notFollowedBy": false,
  "stack": [token1, token2...],
}
```

```js
{
  "type": ret.types.GROUP,
  "remember" true,
  "followedBy": false,
  "notFollowedBy": false,
  "options" [
    [token1, token2...],
    [othertoken1, othertoken2...]
    ...
  ],
}
```

### POSITION

`\b`, `\B`, `^`, and `$` specify positions in the regexp.

```js
{
  "type": ret.types.POSITION,
  "value": "^",
}
```

### SET

Contains a key `set` specifying what tokens are allowed and a key `not` specifying if the set should be negated. A set can contain other sets, ranges, and characters.

```js
{
  "type": ret.types.SET,
  "set": [token1, token2...],
  "not": false,
}
```

### RANGE

Used in set tokens to specify a character range. `from` and `to` are character codes.

```js
{
  "type": ret.types.RANGE,
  "from": 97,
  "to": 122,
}
```

### REPETITION

```js
{
  "type": ret.types.REPETITION,
  "min": 0,
  "max": Infinity,
  "value": token,
}
```

### REFERENCE

References a group token. `value` is 1-9.

```js
{
  "type": ret.types.REFERENCE,
  "value": 1,
}
```

### CHAR

Represents a single character token. `value` is the character code. This might seem a bit cluttering instead of concatenating characters together. But since repetition tokens only repeat the last token and not the last clause like the pipe, it's simpler to do it this way.

```js
{
  "type": ret.types.CHAR,
  "value": 123,
}
```

## Errors

ret.js will throw errors if given a string with an invalid regular expression. All possible errors are

* Invalid group. When a group with an immediate `?` character is followed by an invalid character. It can only be followed by `!`, `=`, or `:`. Example: `/(?_abc)/`
* Nothing to repeat. Thrown when a repetitional token is used as the first token in the current clause, as in right in the beginning of the regexp or group, or right after a pipe. Example: `/foo|?bar/`, `/{1,3}foo|bar/`, `/foo(+bar)/`
* Unmatched ). A group was not opened, but was closed. Example: `/hello)2u/`
* Unterminated group. A group was not closed. Example: `/(1(23)4/`
* Unterminated character class. A custom character set was not closed. Example: `/[abc/`


# Install

    npm install ret


# Tests

Tests are written with [vows](http://vowsjs.org/)

```bash
npm test
```

# License

MIT