让我们一起来构建一个模板引擎(四)_html/css_WEB-ITnose
在 上篇文章 中我们的模板引擎实现了对 include 和 extends 的支持, 到此为止我们已经实现了模板引擎所需的大部分功能。 在本文中我们将解决一些用于生成 html 的模板引擎需要面对的一些安全问题。
转义
首先要解决的就是转义问题。到目前为止我们的模板引擎并没有对变量和表达式结果进行转义处理, 如果用于生成 html 源码的话就会出现下面这样的问题 ( template3c.py ):
>>> from template3c import Template>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-br-world">hello<br />world</h1>'
很明显 title 中包含的标签需要被转义,不然就会出现非预期的结果。 这里我们只对 & " ' > < 这几个字符做转义处理,其他的字符可根据需要进行处理。
html_escape_table = { '&': '&', '"': '"', '\'': ''', '>': '>', '<': '<',}def html_escape(text): return ''.join(html_escape_table.get(c, c) for c in text)
转义效果:
>>> html_escape('hello<br />world')'hello<br />world'
既然有转义自然也要有禁止转义的功能,毕竟不能一刀切否则就丧失灵活性了。
class NoEscape: def __init__(self, raw_text): self.raw_text = raw_textdef escape(text): if isinstance(text, NoEscape): return str(text.raw_text) else: text = str(text) return html_escape(text)def noescape(text): return NoEscape(text)
最终我们的模板引擎针对转义所做的修改如下(可以下载 template4a.py ):
class Template: def __init__(self, ..., auto_escape=True): ... self.auto_escape = auto_escape self.default_context.setdefault('escape', escape) self.default_context.setdefault('noescape', noescape) ... def _handle_variable(self, token): if self.auto_escape: self.buffered.append('escape({})'.format(variable)) else: self.buffered.append('str({})'.format(variable)) def _parse_another_template_file(self, filename): ... template = self.__class__( ..., auto_escape=self.auto_escape ) ...class NoEscape: def __init__(self, raw_text): self.raw_text = raw_texthtml_escape_table = { '&': '&', '"': '"', '\'': ''', '>': '>', '<': '<',}def html_escape(text): return ''.join(html_escape_table.get(c, c) for c in text)def escape(text): if isinstance(text, NoEscape): return str(text.raw_text) else: text = str(text) return html_escape(text)def noescape(text): return NoEscape(text)
效果:
>>> from template4a import Template>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-lt-br-gt-world">hello<br />world</h1>'>>> t = Template('<h1 id="noescape-title">{{ noescape(title) }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-br-world">hello<br />world</h1>'>>>
exec 的安全问题
由于我们的模板引擎是使用 exec 函数来执行生成的代码的,所以就需要注意一下 exec 函数的安全问题,预防可能的服务端模板注入攻击(详见 使用 exec 函数时需要注意的一些安全问题 )。
首先要限制的是在模板中使用内置函数和执行时上下文变量( template4b.py ):
class Template: ... def render(self, context=None): """渲染模版""" namespace = {} namespace.update(self.default_context) namespace.setdefault('__builtins__', {}) # <--- if context: namespace.update(context) exec(str(self.code_builder), namespace) result = namespace[self.func_name]() return result
效果:
>>> from template4b import Template>>> t = Template('{{ open("/etc/passwd").read() }}')>>> t.render()Traceback (most recent call last): File "", line 1, in module> File "/Users/mg/develop/lsbate/part4/template4b.py", line 245, in render result = namespace[self.func_name]() File "", line 3, in __func_nameNameError: name 'open' is not defined
然后就是要限制通过其他方式调用内置函数的行为:
>>> from template4b import Template>>> t = Template('{{ escape.__globals__["__builtins__"]["open"]("/etc/passwd").read()[0] }}')>>> t.render()'#'>>>>>> t = Template("{{ [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0].__init__.__globals__['path'].os.system('date') }}")>>> t.render()Mon May 30 22:10:46 CST 2016'0'
一种解决办法就是不允许在模板中访问以下划线 _ 开头的属性。 为什么要包括单下划线呢,因为约定单下划线开头的属性是约定的私有属性, 不应该在外部访问这些属性。
这里我们使用 dis 模块来帮助我们解析生成的代码,然后再找出其中的特殊属性。
import disimport ioclass Template: def __init__(self, ..., safe_attribute=True): ... self.safe_attribute = safe_attribute def render(self, ...): ... func = namespace[self.func_name] if self.safe_attribute: check_unsafe_attributes(func) result = func()def check_unsafe_attributes(code): writer = io.StringIO() dis.dis(code, file=writer) output = writer.getvalue() match = re.search(r'\d+\s+LOAD_ATTR\s+\d+\s+<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_528889fac10d588d0c4bcca5fc09d9ee.gif' style="max-width:90%" class='tex' alt="(?P<attr>_[^" /></span><script type='math/tex'>(?P<attr>_[^</script>]+)<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_8e9262bd0cfe5666042f5b56e0808688.gif' style="max-width:90%" class='tex' alt="', output) if match is not None: attr = match.group('attr') msg = "access to attribute '{0}' is unsafe.".format(attr) raise AttributeError(msg)
效果:
>>> from template4c import Template>>> t = Template("{{ [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0].__init__.__globals__['path'].os.system('date') }}")>>> t.render()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/xxx/lsbate/part4/template4c.py", line 250, in render check_unsafe_attributes(func) File "/xxx/lsbate/part4/template4c.py", line 296, in check_unsafe_attributes raise AttributeError(msg)AttributeError: access to attribute '__class__' is unsafe.>>>>>> t = Template('<h1 id="title">{{ title }}</h1>')>>> t.render({'title': 'hello<br />world'})'<h1 id="hello-lt-br-gt-world">hello<br />world</h1>'
这个系列的文章到目前为止就已经全部完成了。
如果大家感兴趣的话可以尝试使用另外的方式来解析模板内容, 即: 使用词法分析/语法分析的方式来解析模板内容(欢迎分享实现过程)。
P.S. 整个系列的所有文章地址:
- 让我们一起来构建一个模板引擎(三)
- 让我们一起来构建一个模板引擎(二)
- 让我们一起来构建一个模板引擎(一)
P.S. 文章中涉及的代码已经放到 GitHub 上了: https://github.com/mozillazg/lsbate
打赏支持我写出更多好文章,谢谢!
打赏作者
打赏支持我写出更多好文章,谢谢!
关于作者:mozillazg
好好学习,天天向上。 个人主页 · 我的文章 · 1 ·

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

HTML is suitable for beginners because it is simple and easy to learn and can quickly see results. 1) The learning curve of HTML is smooth and easy to get started. 2) Just master the basic tags to start creating web pages. 3) High flexibility and can be used in combination with CSS and JavaScript. 4) Rich learning resources and modern tools support the learning process.

HTML defines the web structure, CSS is responsible for style and layout, and JavaScript gives dynamic interaction. The three perform their duties in web development and jointly build a colorful website.

WebdevelopmentreliesonHTML,CSS,andJavaScript:1)HTMLstructurescontent,2)CSSstylesit,and3)JavaScriptaddsinteractivity,formingthebasisofmodernwebexperiences.

GiteePages static website deployment failed: 404 error troubleshooting and resolution when using Gitee...

AnexampleofastartingtaginHTMLis,whichbeginsaparagraph.StartingtagsareessentialinHTMLastheyinitiateelements,definetheirtypes,andarecrucialforstructuringwebpagesandconstructingtheDOM.

To achieve the effect of scattering and enlarging the surrounding images after clicking on the image, many web designs need to achieve an interactive effect: click on a certain image to make the surrounding...

HTML, CSS and JavaScript are the three pillars of web development. 1. HTML defines the web page structure and uses tags such as, etc. 2. CSS controls the web page style, using selectors and attributes such as color, font-size, etc. 3. JavaScript realizes dynamic effects and interaction, through event monitoring and DOM operations.

The Y-axis position adaptive algorithm for web annotation function This article will explore how to implement annotation functions similar to Word documents, especially how to deal with the interval between annotations...
