HTML Parser
Copy paste from HTML to Slate.
HTML
'text/plain'
data. While this is suitable for certain scenarios, there are times when you want users to be able to paste content while preserving its formatting. To achieve this, your editor should be capable of handling 'text/html'
data.Many Plate plugins include HTML deserialization rules. These rules define how HTML elements and styles are mapped to Plate's node types and attributes.
HTML -> Slate
Usage
The editor.api.html.deserialize
function allows you to convert HTML content into Slate value:
import { createPlateEditor } from '@udecode/plate-common/react';
const editor = createPlateEditor({
plugins: [
// all plugins that you want to deserialize
]
})
editor.children = editor.api.html.deserialize('<p>Hello, world!</p>')
Plugin Deserialization Rules
Here's a comprehensive list of plugins that support HTML deserialization, along with their corresponding HTML elements and styles:
Text Formatting
- BoldPlugin:
<strong>
,<b>
, orstyle="font-weight: 600|700|bold"
- ItalicPlugin:
<em>
,<i>
, orstyle="font-style: italic"
- UnderlinePlugin:
<u>
orstyle="text-decoration: underline"
- StrikethroughPlugin:
<s>
,<del>
,<strike>
, orstyle="text-decoration: line-through"
- SubscriptPlugin:
<sub>
orstyle="vertical-align: sub"
- SuperscriptPlugin:
<sup>
orstyle="vertical-align: super"
- CodePlugin:
<code>
orstyle="font-family: Consolas"
(not within a<pre>
tag) - KbdPlugin:
<kbd>
Paragraphs and Headings
- ParagraphPlugin:
<p>
- HeadingPlugin:
<h1>
,<h2>
,<h3>
,<h4>
,<h5>
,<h6>
Lists
- ListPlugin:
- Unordered List:
<ul>
- Ordered List:
<ol>
- List Item:
<li>
- Unordered List:
- IndentListPlugin:
- List Item:
<li>
- Parses
aria-level
attribute for indentation
- List Item:
Blocks
- BlockquotePlugin:
<blockquote>
- CodeBlockPlugin:
- Deserializes
<pre>
elements - Deserializes
<p>
elements withfontFamily: 'Consolas'
style - Splits content into code lines
- Preserves language information if available
- Deserializes
- HorizontalRulePlugin:
<hr>
Links and Media
- LinkPlugin:
<a>
- ImagePlugin:
<img>
- MediaEmbedPlugin:
<iframe>
Tables
- TablePlugin:
- Table:
<table>
- Table Row:
<tr>
- Table Cell:
<td>
- Table Header Cell:
<th>
- Table:
Text Styling
- FontBackgroundColorPlugin:
style="background-color: *"
- FontColorPlugin:
style="color: *"
- FontFamilyPlugin:
style="font-family: *"
- FontSizePlugin:
style="font-size: *"
- FontWeightPlugin:
style="font-weight: *"
- HighlightPlugin:
<mark>
Layout and Formatting
- AlignPlugin:
style="text-align: *"
- LineHeightPlugin:
style="line-height: *"
Deserialization Properties
Plugins can define various properties to control HTML deserialization:
parse
: A function to parse an HTML element into a Plate nodequery
: A function that determines if the deserializer should be appliedrules
: An array of rule objects that define valid HTML elements and attributesisElement
: Indicates if the plugin deserializes elementsisLeaf
: Indicates if the plugin deserializes leaf nodesattributeNames
: List of HTML attribute names to store innode.attributes
withoutChildren
: Excludes child nodes from deserializationrules
: Array of rule objects for element matchingvalidAttribute
: Valid element attributesvalidClassName
: Valid CSS class namevalidNodeName
: Valid HTML tag namesvalidStyle
: Valid CSS styles
Extending Deserialization
You can extend or customize the deserialization behavior of any plugin. Here's an example of how you might extend the CodeBlockPlugin
:
import { CodeBlockPlugin } from '@udecode/plate-code-block';
const CustomCodeBlockPlugin = CodeBlockPlugin.extend({
parsers: {
html: {
deserializer: {
parse: ({ element }) => {
const language = element.getAttribute('data-language');
const textContent = element.textContent || '';
const lines = textContent.split('\n');
return {
type: CodeBlockPlugin.key,
language,
children: lines.map((line) => ({
type: CodeLinePlugin.key,
children: [{ text: line }],
})),
};
},
rules: [
...CodeBlockPlugin.parsers.html.deserializer.rules,
{ validAttribute: 'data-language' },
],
},
},
},
});
This customization adds support for a data-language
attribute in code block deserialization and preserves the language information.
Advanced Deserialization Example
The IndentListPlugin
provides a more complex deserialization process:
- It transforms HTML list structures into indented paragraphs.
- It handles nested lists by preserving the indentation level.
- It uses the
aria-level
attribute to determine the indentation level.
Here's a simplified version of its deserialization logic:
export const IndentListPlugin = createTSlatePlugin<IndentListConfig>({
// ... other configurations ...
parsers: {
html: {
deserializer: {
isElement: true,
parse: ({ editor, element, getOptions }) => ({
indent: Number(element.getAttribute('aria-level')),
listStyleType: element.style.listStyleType,
type: editor.getType(ParagraphPlugin),
}),
rules: [
{
validNodeName: 'LI',
},
],
},
},
},
});
API
editor.api.html.deserialize
Deserialize HTML string into Slate value.
Parameters
- Default:
true
(Whitespace will be removed.)
options.element HTMLElement | string
The HTML element or string to deserialize.
options.collapseWhiteSpace optional boolean
Flag to enable or disable the removal of whitespace from the serialized HTML.
Returns
The deserialized Slate value.
Slate -> React -> HTML
Installation
npm install @udecode/plate-html
Usage
// ...
import { HtmlReactPlugin } from '@udecode/plate-html/react';
import { DndProvider } from 'react-dnd';
import { HTML5Backend } from 'react-dnd-html5-backend';
const editor = createPlateEditor({
plugins: [
HtmlReactPlugin
// all plugins that you want to serialize
],
override: {
// do not forget to add your custom components, otherwise it won't work
components: createPlateUI(),
},
});
const html = editor.api.htmlReact.serialize({
nodes: editor.children,
// if you use @udecode/plate-dnd
dndWrapper: (props) => <DndProvider backend={HTML5Backend} {...props} />,
});
Note: Round-tripping is not yet supported: the HTML serializer will not preserve all information from the Slate value when converting to HTML and back.
API
editor.api.htmlReact.serialize
Convert Slate Nodes into HTML string.
Parameters
Options to control the HTML serialization process.
Returns
A HTML string representing the Slate nodes.