Mistune¶
The fastest markdown parser in pure Python with renderer features, inspired by marked.
Features¶
- Pure Python. Tested in Python 2.7, Python 3.5+ and PyPy.
- Very Fast. It is the fastest in all pure Python markdown parsers.
- More Features. Table, footnotes, autolink, fenced code etc.
View the benchmark results.
Installation¶
Installing mistune with pip:
$ pip install mistune
Mistune can be faster, if you compile with cython:
$ pip install cython mistune
Basic Usage¶
A simple API that render a markdown formatted text:
import mistune
mistune.markdown('I am using **mistune markdown parser**')
# output: <p>I am using <strong>mistune markdown parser</strong></p>
If you care about performance, it is better to re-use the Markdown instance:
import mistune
markdown = mistune.Markdown()
markdown('I am using **mistune markdown parser**')
Mistune has enabled all features by default. You don’t have to configure anything. But there are options for you to change the parser behaviors.
Options¶
Here is a list of all options that will affect the rendering results,
configure them with mistune.Renderer
:
renderer = mistune.Renderer(escape=True, hard_wrap=True)
# use this renderer instance
markdown = mistune.Markdown(renderer=renderer)
markdown(text)
- escape: if set to False, all raw html tags will not be escaped.
- hard_wrap: if set to True, it will has GFM line breaks feature.
All new lines will be replaced with
<br>
tag - use_xhtml: if set to True, all tags will be in xhtml, for example:
<hr />
. - parse_block_html: parse text only in block level html.
- parse_inline_html: parse text only in inline level html.
When using the default renderer, you can use one of the following shortcuts:
mistune.markdown(text, escape=True, hard_wrap=True)
markdown = mistune.Markdown(escape=True, hard_wrap=True)
markdown(text)
Renderer¶
Like misaka/sundown, you can influence the rendering by custom renderers. All you need to do is subclassing a Renderer class.
Here is an example of code highlighting:
import mistune
from pygments import highlight
from pygments.lexers import get_lexer_by_name
from pygments.formatters import html
class HighlightRenderer(mistune.Renderer):
def block_code(self, code, lang):
if not lang:
return '\n<pre><code>%s</code></pre>\n' % \
mistune.escape(code)
lexer = get_lexer_by_name(lang, stripall=True)
formatter = html.HtmlFormatter()
return highlight(code, lexer, formatter)
renderer = HighlightRenderer()
markdown = mistune.Markdown(renderer=renderer)
print(markdown('```python\nassert 1 == 1\n```'))
Find more renderers in mistune-contrib.
Block Level¶
Here is a list of block level renderer API:
block_code(code, language=None)
block_quote(text)
block_html(html)
header(text, level, raw=None)
hrule()
list(body, ordered=True)
list_item(text)
paragraph(text)
table(header, body)
table_row(content)
table_cell(content, **flags)
The flags tells you whether it is header with flags['header']
. And it
also tells you the align with flags['align']
.
Span Level¶
Here is a list of span level renderer API:
autolink(link, is_email=False)
codespan(text)
double_emphasis(text)
emphasis(text)
image(src, title, alt_text)
linebreak()
newline()
link(link, title, content)
strikethrough(text)
text(text)
inline_html(text)
Footnotes¶
Here is a list of renderers related to footnotes:
footnote_ref(key, index)
footnote_item(key, text)
footnotes(text)
Lexers¶
Sometimes you want to add your own rules to Markdown, such as GitHub Wiki links. You can’t achieve this goal with renderers. You will need to deal with the lexers, it would be a little difficult for the first time.
We will take an example for GitHub Wiki links: [[Page 2|Page 2]]
.
It is an inline grammar, which requires custom InlineGrammar
and
InlineLexer
:
import copy,re
from mistune import Renderer, InlineGrammar, InlineLexer
class WikiLinkRenderer(Renderer):
def wiki_link(self, alt, link):
return '<a href="%s">%s</a>' % (link, alt)
class WikiLinkInlineLexer(InlineLexer):
def enable_wiki_link(self):
# add wiki_link rules
self.rules.wiki_link = re.compile(
r'\[\[' # [[
r'([\s\S]+?\|[\s\S]+?)' # Page 2|Page 2
r'\]\](?!\])' # ]]
)
# Add wiki_link parser to default rules
# you can insert it some place you like
# but place matters, maybe 3 is not good
self.default_rules.insert(3, 'wiki_link')
def output_wiki_link(self, m):
text = m.group(1)
alt, link = text.split('|')
# you can create an custom render
# you can also return the html if you like
return self.renderer.wiki_link(alt, link)
You should pass the inline lexer to Markdown
parser:
renderer = WikiLinkRenderer()
inline = WikiLinkInlineLexer(renderer)
# enable the feature
inline.enable_wiki_link()
markdown = Markdown(renderer, inline=inline)
markdown('[[Link Text|Wiki Link]]')
It is the same with block level lexer. It would take a while to understand the whole mechanism. But you won’t do the trick a lot.
Contribution & Extensions¶
Mistune itself doesn’t accept any extension. It will always be a simple one file script.
If you want to add features, you can head over to mistune-contrib.
Here are some extensions already in mistune-contrib:
- Math/MathJax features
- Highlight Code Renderer
- TOC table of content features
- MultiMarkdown Metadata parser
Get inspired with the contrib repository.
Developer Guide¶
Here is the API reference for mistune.
-
class
mistune.
Renderer
(**kwargs)¶ The default HTML renderer for rendering Markdown.
-
autolink
(link, is_email=False)¶ Rendering a given link or email address.
Parameters: - link – link content or email address.
- is_email – whether this is an email or not.
-
block_code
(code, lang=None)¶ Rendering block level code.
pre > code
.Parameters: - code – text content of the code block.
- lang – language of the given code.
-
block_html
(html)¶ Rendering block level pure html content.
Parameters: html – text content of the html snippet.
-
block_quote
(text)¶ Rendering <blockquote> with the given text.
Parameters: text – text content of the blockquote.
-
codespan
(text)¶ Rendering inline code text.
Parameters: text – text content for inline code.
-
double_emphasis
(text)¶ Rendering strong text.
Parameters: text – text content for emphasis.
-
emphasis
(text)¶ Rendering emphasis text.
Parameters: text – text content for emphasis.
-
escape
(text)¶ Rendering escape sequence.
Parameters: text – text content.
-
footnote_item
(key, text)¶ Rendering a footnote item.
Parameters: - key – identity key for the footnote.
- text – text content of the footnote.
-
footnote_ref
(key, index)¶ Rendering the ref anchor of a footnote.
Parameters: - key – identity key for the footnote.
- index – the index count of current footnote.
-
footnotes
(text)¶ Wrapper for all footnotes.
Parameters: text – contents of all footnotes.
-
header
(text, level, raw=None)¶ Rendering header/heading tags like
<h1>
<h2>
.Parameters: - text – rendered text content for the header.
- level – a number for the header level, for example: 1.
- raw – raw text content of the header.
-
hrule
()¶ Rendering method for
<hr>
tag.
-
image
(src, title, text)¶ Rendering a image with title and text.
Parameters: - src – source link of the image.
- title – title text of the image.
- text – alt text of the image.
-
inline_html
(html)¶ Rendering span level pure html content.
Parameters: html – text content of the html snippet.
-
linebreak
()¶ Rendering line break like
<br>
.
-
link
(link, title, text)¶ Rendering a given link with content and title.
Parameters: - link – href link for
<a>
tag. - title – title content for title attribute.
- text – text content for description.
- link – href link for
-
list
(body, ordered=True)¶ Rendering list tags like
<ul>
and<ol>
.Parameters: - body – body contents of the list.
- ordered – whether this list is ordered or not.
-
list_item
(text)¶ Rendering list item snippet. Like
<li>
.
-
newline
()¶ Rendering newline element.
-
paragraph
(text)¶ Rendering paragraph tags. Like
<p>
.
-
placeholder
()¶ Returns the default, empty output value for the renderer.
All renderer methods use the ‘+=’ operator to append to this value. Default is a string so rendering HTML can build up a result string with the rendered Markdown.
Can be overridden by Renderer subclasses to be types like an empty list, allowing the renderer to create a tree-like structure to represent the document (which can then be reprocessed later into a separate format like docx or pdf).
-
strikethrough
(text)¶ Rendering ~~strikethrough~~ text.
Parameters: text – text content for strikethrough.
-
table
(header, body)¶ Rendering table element. Wrap header and body in it.
Parameters: - header – header part of the table.
- body – body part of the table.
-
table_cell
(content, **flags)¶ Rendering a table cell. Like
<th>
<td>
.Parameters: - content – content of current table cell.
- header – whether this is header or not.
- align – align of current table cell.
-
table_row
(content)¶ Rendering a table row. Like
<tr>
.Parameters: content – content of current table row.
-
text
(text)¶ Rendering unformatted text.
Parameters: text – text content.
-
-
class
mistune.
Markdown
(renderer=None, inline=None, block=None, **kwargs)¶ The Markdown parser.
Parameters: - renderer – An instance of
Renderer
. - inline – An inline lexer class or instance.
- block – A block lexer class or instance.
-
render
(text)¶ Render the Markdown text.
Parameters: text – markdown formatted text content.
- renderer – An instance of
-
mistune.
markdown
(text, escape=True, **kwargs)¶ Render markdown formatted text to html.
Parameters: - text – markdown formatted text content.
- escape – if set to False, all html tags will not be escaped.
- use_xhtml – output with xhtml tags.
- hard_wrap – if set to True, it will use the GFM line breaks feature.
- parse_block_html – parse text only in block level html.
- parse_inline_html – parse text only in inline level html.
-
mistune.
escape
(text, quote=False, smart_amp=True)¶ Replace special characters “&”, “<” and “>” to HTML-safe sequences.
The original cgi.escape will always escape “&”, but you can control this one for a smart escape amp.
Parameters: - quote – if set to True, ” and ‘ will be escaped.
- smart_amp – if set to False, & will always be escaped.
Changelog¶
Here is the full history of mistune.
Version 0.8.4¶
Released on Oct. 11, 2018
Version 0.8¶
Released on Oct. 26, 2017
- Remove non breaking spaces preprocessing
- Remove rev and rel attribute for footnotes
- Fix bypassing XSS vulnerability by junorouse
This version is strongly recommended, since it fixed a security issue.
Version 0.7.4¶
Released on Mar. 14, 2017
- Fix escape_link method by Marcos Ojeda
- Handle block HTML with no content by David Baumgold
- Use expandtabs for tab
- Fix escape option for text renderer
- Fix HTML attribute regex pattern
Version 0.7.3¶
Released on Jun. 28, 2016
- Fix strikethrough regex
- Fix HTML attribute regex
- Fix close tag regex
Version 0.7.2¶
Released on Feb. 26, 2016
Version 0.7¶
Released on Jul. 18, 2015
- Fix the breaking change in version 0.6 with options: parse_inline_html and parse_block_html
- Breaking change: remove parse_html option for explicit
- Change option escape default value to
True
for security reason
Version 0.6¶
Released on Jun. 17, 2015
- Breaking change on inline HTML, text in inline HTML will not be parsed per #38.
- Replace tag renderer with inline_html for breaking change on inline HTML
- Double emphasis, emphasis, code, and strikethrough can contain one linebreak per #48.
- Match autolinks that do not have / in their URI via #53.
- A work around on link that contains
)
per #46. - Add
<font>
tag for inline tags per #55.
Version 0.5.1¶
Released on Mar. 10, 2015
- Fix a bug when list item is blank via ipython#7929.
- Use python-wheels to build wheels for Mac.
Version 0.5¶
Released on Dec. 5, 2014. This release will break things.
- For custom lexers, features is replaced with rules.
- Refactor on function names and codes.
- Add a way to output the render tree via #20.
- Fix emphasis and strikethrough regular expressions.
Version 0.4.1¶
Released on Oct. 12, 2014
- Add option for parse markdown in block level html.
- Fix on lheading, any number of underline = or - will work.
- Patch for setup if Cython is available but no C compiler.
Version 0.4¶
Released on Aug. 14, 2014
- Bugfix. Use inspect to detect renderer class.
- Move all meth:escape to renderer. Use renderer to escape everything.
- A little changes in code style and parameter naming.
- Don’t parse text in a block html, behave like sundown.
Version 0.3.1¶
Released on Jul. 31, 2014
- Fix in meth:Renderer.block_code, no need to add
\n
in<code>
. - Trim whitespace of code in code span via #15.
Version 0.3¶
Released on Jun. 27, 2014
Version 0.2¶
Released on Mar. 12, 2014
- Use tuple instead of list for efficient
- Add
line_match
andline_started
property on InlineLexer, via #4
Version 0.1¶
First preview release.