id | title |
---|---|
txtnode |
TxtAST Interface |
TxtAST define AST(Abstract Syntax Tree) for processing in textlint.
Abstract syntax tree is a tree representation of the abstract syntactic structure of text.
Each node of the tree has same interface, is called TxtNode
.
AST explorer for textlint is useful for understanding AST.
TxtNode
is an abstract node.
/**
* Basic TxtNode
* Probably, Real TxtNode implementation has more properties.
*/
interface TxtNode {
type: string;
raw: string;
range: TextNodeRange;
loc: TxtNodeLineLocation;
// parent is runtime information
// Not need in AST
// For example, top Root Node like `Document` has not parent.
parent?: TxtNode;
[index: string]: any;
}
/**
* Location
*/
interface TxtNodeLineLocation {
start: TxtNodePosition;
end: TxtNodePosition;
}
/**
* Position's line start with 1.
* Position's column start with 0.
* This is for compatibility with JavaScript AST.
* https://gist.github.com/azu/8866b2cb9b7a933e01fe
*/
interface TxtNodePosition {
line: number; // start with 1
column: number; // start with 0
}
/**
* Range start with 0
*/
type TextNodeRange = [number, number];
TxtNode
must have these properties.
type
: type of Noderaw
: raw value of Node- if you want to get raw value, please use
getSource(<node>)
instead of it..
- if you want to get raw value, please use
loc
: location objectrange
: location info array like[startIndex, endIndex]
parent
: (optional) parent node of this node.- It is attached in runtime
- Parser user ignore this property
TxtTextNode
is inherit the TxtNode
abstract interface.
/**
* Text Node.
* Text Node has inline value.
* For example, `Str` Node is a TxtTextNode.
*/
interface TxtTextNode extends TxtNode {
value: string;
}
TxtTextNode
must have these properties.
value
: the value of inline node.
Example: Str
node is a TxtTextNode
.
TxtParentNode
is inherit the TxtNode
abstract interface.
/**
* Parent Node.
* Parent Node has children that are consist of TxtNode or TxtTextNode
*/
interface TxtParentNode extends TxtNode {
children: Array<TxtNode | TxtTextNode>;
}
TxtParentNode
must have these properties.
children
: child nodes of this node.
Example: Paragraph
node is a TxtParentNode
.
type
is TxtNode type.
All Types are defined in @textlint/ast-node-types
.
You can use this ASTNodeTypes
value via following way:
import { ASTNodeTypes } from "@textlint/ast-node-types";
console.log(ASTNodeTypes.Str); // "Str"
You can get Node type for Type name by TypeofTxtNode
in TypeScript.
// In TypeScript
import { ASTNodeTypes } from "@textlint/ast-node-types";
const nodeType = TypeofTxtNode<ASTNodeTypes.Str>; // TxtTextNode
These types are defined in @textlint/ast-node-types
.
Type name | Node type | Description |
---|---|---|
ASTNodeTypes.Document | TxtParentNode | Root Node |
ASTNodeTypes.DocumentExit | TxtParentNode | |
ASTNodeTypes.Paragraph | TxtParentNode | Paragraph Node |
ASTNodeTypes.ParagraphExit | TxtParentNode | |
ASTNodeTypes.BlockQuote | TxtParentNode | > Block Quote Node |
ASTNodeTypes.BlockQuoteExit | TxtParentNode | |
ASTNodeTypes.List | TxtParentNode | List Node |
ASTNodeTypes.ListExit | TxtParentNode | |
ASTNodeTypes.ListItem | TxtParentNode | List (each) item Node |
ASTNodeTypes.ListItemExit | TxtParentNode | |
ASTNodeTypes.Header | TxtParentNode | # Header Node |
ASTNodeTypes.HeaderExit | TxtParentNode | |
ASTNodeTypes.CodeBlock | TxtParentNode | Code Block Node |
ASTNodeTypes.CodeBlockExit | TxtParentNode | |
ASTNodeTypes.HtmlBlock | TxtParentNode | HTML Block Node |
ASTNodeTypes.HtmlBlockExit | TxtParentNode | |
ASTNodeTypes.Link | TxtParentNode | Link Node |
ASTNodeTypes.LinkExit | TxtParentNode | |
ASTNodeTypes.Delete | TxtParentNode | Delete Node(~Str~ ) |
ASTNodeTypes.DeleteExit | TxtParentNode | |
ASTNodeTypes.Emphasis | TxtParentNode | Emphasis(*Str* ) |
ASTNodeTypes.EmphasisExit | TxtParentNode | |
ASTNodeTypes.Strong | TxtParentNode | Strong Node(**Str** ) |
ASTNodeTypes.StrongExit | TxtParentNode | |
ASTNodeTypes.Break | TxtNode | Hard Break Node(Str<space><space> ) |
ASTNodeTypes.BreakExit | TxtNode | |
ASTNodeTypes.Image | TxtNode | Image Node |
ASTNodeTypes.ImageExit | TxtNode | |
ASTNodeTypes.HorizontalRule | TxtNode | Horizontal Node(--- ) |
ASTNodeTypes.HorizontalRuleExit | TxtNode | |
ASTNodeTypes.Comment | TxtTextNode | Comment Node |
ASTNodeTypes.CommentExit | TxtTextNode | |
ASTNodeTypes.Str | TxtTextNode | Str Node |
ASTNodeTypes.StrExit | TxtTextNode | |
ASTNodeTypes.Code | TxtTextNode | Inline Code Node |
ASTNodeTypes.CodeExit | TxtTextNode | |
ASTNodeTypes.Html | TxtTextNode | Inline HTML Node |
ASTNodeTypes.HtmlExit | TxtTextNode |
The type is based on HTML tag and Markdown syntax.
Other plugin has define other node type that is not defined in @textlint/ast-node-types
, but you can specify it as just a string.
TxtAST allow to extend node property. But, Following node should have some properties.
depth
: level of header- Example:
<h1>
isdepth:1
,<h2>
isdepth:2
...
- Example:
url
: link url
url
: image url
textlint has built-in parsers.
Package | Version | Description |
---|---|---|
@textlint/markdown-to-ast |
markdown parser | |
@textlint/text-to-ast |
plain text parser |
If you want to get other type, please create new issue.
That TxtNode
interface is defined in packages/ast-node-types.
If you want to use this interface from TypeScript, packages/ast-node-types is useful.
AST explorer for textlint is useful for understanding AST.
Minimum(recommended) rules is following code:
/**
* @param {RuleContext} context
*/
export default function (context) {
const { Syntax } = context;
// root object
return {
[Syntax.Document](node) {},
[Syntax.Paragraph](node) {},
[Syntax.Str](node) {}
};
}
loc
is location info object.
"loc": {
"start": {
"line": 2,
"column": 4
},
"end": {
"line": 2,
"column": 10
}
}
line
of location start with 1 (1-indexed).column
of location start with 0 (0-indexed).
This is for compatibility with JavaScript AST.
Important Note:
Text -> AST TxtNode(0-based columns here) -> textlint -> TextLintMessage(1-based columns)
TxtNode
has 0-based columns, but the result of linting named TextLintMessage
has 1-based columns.
In other word, textlint's rule handle TxtNode
, but formatter handle TextLintMessage
.
Input: *text*
Output: The AST by AST explorer for textlint + Markdown
{
"type": "Document",
"children": [
{
"type": "Paragraph",
"children": [
{
"type": "Emphasis",
"children": [
{
"type": "Str",
"value": "text",
"loc": {
"start": {
"line": 1,
"column": 1
},
"end": {
"line": 1,
"column": 5
}
},
"range": [
1,
5
],
"raw": "text"
}
],
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 6
}
},
"range": [
0,
6
],
"raw": "*text*"
}
],
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 6
}
},
"range": [
0,
6
],
"raw": "*text*"
}
],
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 6
}
},
"range": [
0,
6
],
"raw": "*text*"
}
Illustration
* text *
| |__| |
| value |
|__________|
raw
- Document is a
TxtParentNode
and type is Document- have
children
, but not havevalue
- have
- Paragraph is a
TxtParentNode
and type is Paragraph- have
children
, but not havevalue
- have
- Emphasis is a
TxtTextNode
and type is Emphasis- have
value
- have
- "text" is a
TxtTextNode
and type is Str- have
value
- have
TxtAST
have a minimum of compatibility for unist: Universal Syntax Tree.
We discuss about Unist in Compliances tests for TxtNode #141.
You can use @textlint/ast-tester for testing your processor plugin's parser.
import { test, isTextlintAST } from "@textlint/ast-tester";
// your implement
import yourParse from "your-parser";
// recommenced: test much pattern test
const AST = yourParse("This is text");
// Validate AST
test(AST); // if the AST is invalid, then throw Error
isTextlintAST(AST); // true or false
Other properties is not assured.
For example, markdown's Header
node has level
property, but other format has not it.