@a11ygato/audit-engine
Overview
该项目是负责审核 URL、URL 树或浏览器容器中的 puppeteer 脚本的引擎。
在幕后,我们正在使用 axe-core 来分析页面。 但我们不排除将来添加其他辅助功能引擎的可能性。
有一种浏览器容器:Chrome Headless。 之前也有Phantom(Webkit),不过已经弃用了。
这是节点 API。 有一个名为 @a11ygato/cli
的 CLI。
Ax 配置了以下可访问性规范:
这些规则也已停用:
- bypass
- video-caption
- audio-caption
- object-alt
- video-description
浏览器容器作为依赖项由该模块自动安装和实例化。
Installation
npm i -S @a11ygato/audit-engine
Initialize the engine
你必须先初始化审计引擎包,然后一劳永逸:
const auditEngine = require('@a11ygato/audit-engine');
const settings = {...};
await auditEngine.init(settings);
Trigger an audit or a scenario
之后,你可以触发多少审计或你想要的场景。
审计总是有一个根 URL。 您可以启用抓取以查找新的 URL。
const auditEngine = require('@a11ygato/audit-engine');
const task = {url: 'https://...', depth, limit, ...};
const audit = auditEngine.createAudit(task);
const report = await audit.run();
另一种分析页面的方法是创建场景。 一个场景通过脚本重新组合一个或多个审计。 脚本只是使用具有一些特殊性的 Puppeteer API 的 javascript 代码。
const auditEngine = require('@a11ygato/audit-engine');
const task = {scenario: '...'};
const scenario = auditEngine.createScenario(task)
const report = await scenario.run();
您可以使用 try/catch
语句来捕获
错误,但您只会收到未捕获的错误。 如果需要,审核或场景总是会完成并最终将失败原因存储在 exception
字段中。
API
auditEngine.init(settings)
修改整个引擎的设置。
settings
<Object>
publicFolder
<String> Absolute or relative path to the folder that will store screen captures and source codes. Default: ./public
.
axeScript
<String> Absolute or relative path to the javascript axe file. If not provided, the axe-core
package will be searched in node_modules
.
includeRawAxeResults
<boolean> Whether raw results returned by axe should be included too? Default: false.
screenshotFilename
<String> Basename for the captured screenshot. Default: page.png
.
sourceFilename
<String> Basename for the captured source code. Default: source.html
.
concurrentInstances
<Number> Max number of concurrent browser instances. Default: 15
.
The maximum concurrent jobs for one audit being 5, we can have between 3 (each task having 5 concurrents jobs) and 15 tasks (each task having one concurrent job) in parallel.
proxy
<String> Proxy address. Default: null
.
- returns <Promise>
该引擎为每个浏览器容器管理一个连接池。
今天只剩下 Chrome,但之前还有 Phantom。
您可以同时拥有的最大连接数是有限制的。
即使您启动 20 个并发数为 5 的审计,当达到最大值时,您将不得不等待空闲连接。
这显然是自动的,但仍然很高兴知道。 它可以解释为什么审计比平时花费更多的时间。
auditEngine.getSettings(settings)
获取引擎设置。
auditEngine.createAudit(task)
task
<Object>
url
<String> Only mandatory argument. Root URL to audit from which we may start crawling.
limit
<Number> Maximum number of URL that may be audited independently of the depth. Default: 1. Max: 200.
depth
Define how many levels of crawling may be done. When a URL is audited, if depth
is > 0 and/or
limit
> 1, the engine will parse the source in search of new URL from the same domain and add them (protocol, hostname and port must match). Default: 5.
concurrency
<Number> Maximum number of URL audited in parallel. Default: 5. Max: 5.
timeout
<Number> Timeout (in ms) used in several places notably when running axe tests.
Be aware that some very large site might solicit axe for several minutes before getting the results back. Default: 90000.
urlFilter
<String> Regular expression content (without // or flags) that allows you to bypass the default crawling algorithm and decide precisely which urls are selected by the crawler.
- returns <Audit>
auditEngine.createScenario(task)
一个场景是使用 Puppeteer API 的 JavaScript 代码 (ES6)。
如果需要,您可以使用现代的 async/await
关键字,并且不需要任何 IIEFor 异步包装函数即可立即开始使用它。
无论您喜欢哪种代码风格,您都应该始终返回一个 Promise
作为您的最后一个表达式:
return new Promise((resolve, reject) => {
// Do something
resolve(); // or reject()
});
// OR
return audit(...).then(() => {
return audit(...);
}).then(() => {
return audit(...);
});
// OR
await audit(...);
await audit(...);
// OR
return Promise.all(audit(...), audit(...), audit(...));
Examples
A simple example with BASIC authentication
/************************************************************************************/
// BASIC AUTH
/************************************************************************************/
const username = 'foo';
const password = 'bar';
const url = `https://httpbin.org/basic-auth/${username}/${password}`;
await page.goto(url);
await page.authenticate({ username, password });
// Trigger an audit using the current state (current page url).
await audit(page);
// This is another way to set HTTP headers but be aware they will be preserved between navigations.
// const headers = new Map();
// headers.set(
// 'Authorization',
// `Basic ${new Buffer(`${username}:${password}`).toString('base64')}`
// );
// await page.setExtraHTTPHeaders(headers);
A more advanced example
/************************************************************************************/
// FORM AUTH
/************************************************************************************/
const login = 'foo';
const password = 'bar';
const timeout = 90000;
await page.goto('http://www.orange.fr/portail');
// Click the customer space button
await page.waitForSelector('.espace-client-left', { visible: true, timeout });
await page.click('.espace-client-left');
// Click the identification button
await page.waitForSelector('#sc-identification .ec_button5', { visible: true, timeout });
await page.click('#sc-identification .ec_button5');
// Set username and password
await page.waitForSelector('#default_f_credential', { visible: true, timeout });
await page.$eval('#default_f_credential', el => el.value = '');
await page.focus('#default_f_credential');
await page.type(login);
await page.waitForSelector('#default_f_password', { visible: true, timeout });
await page.focus('#default_f_password');
await page.type(password);
// Submit form
await page.click('#AuthentForm input[type=submit].sc_button_content_2.submit');
await page.waitForNavigation({ timeout });
// Audit current page and another one (crawl).
await audit(page, { limit: 2, depth: 1 });
// Audit another site entirely with the same connection.
// URLs will be audited sequentially because there is only one connection.
await audit(page, { url: 'http://www.orange.es', limit:5 });
// Audit another site without reusing the same connection (will use a shared pool of connections).
// A maximum of 5 URLs can be audited in parallel.
await audit({ url: 'https://twitter.com', concurrency: 5, limit: 3, depth: 2 });
Globals
page:Page
page
是 Puppeteer Page
类的一个实例。 即使出于明显的安全原因冻结了某些方法,您也可以正常使用它。
您可以在 Chrome 中将 page
视为一个选项卡。
audit(page, task)
page
<Page> Optional Puppeteer Page instance.
task
<Object> Task configuration:
url
<String> URL to audit
limit
<Number> Maximum number of URL that will be audited. Default 1
.
depth
<Number> Maximum depth when crawling (searching new urls). Default 5
.
concurrency
<Number> Maximum number of URL audited in parallel (between 1 and 5). Default 5
.
timeout
<Number> Timeout used during an audit for specific tasks like loading the page, executing axe tests, etc. Default 180000
.
urlFilter
<String> Regular expression content (without // or flags) that allows you to bypass the default crawling algorithm and decide precisely which urls are selected by the crawler.
当省略 page
时,您必须至少提供以下极简任务配置:{ url:'https://...' }
。
只传输当前的page
相当于传输一个只有一个URL(当前的)的极简任务配置。
Accessible modules
您可以要求的不多。 这是当前的白名单:
您可以访问经典的 ES6 全局变量和节点 Buffer
类。
auditEngine.shutdown()
我鼓励您在关机时正确关闭引擎。 例如:
process.once('SIGINT', function () {
console.log('Received SIGINT');
return gracefulShutdown(0);
});
////////
function gracefulShutdown(exitCode) {
// Destroying pools.
return auditEngine.shutdown().then(function() {
console.log('Pools drained');
process.exit(exitCode);
});
}
class: Audit
run
<function> Start auditing. Each url in error is tried two more times before being marked in error.
class: Scenario
Report
审计或场景返回具有以下结构(本质上是遗留的)的报告。
count
<Object>
total
<Number> Aggregated number of tests
pass
<Number> Aggregated number of successes
error
<Number> Aggregated number of errors
warning
<Number> Aggregated number of warnings
notice
<Number> Aggregated number of notices
numElements
<Number> Aggregated number of HTML elements in all audited pages
urls
<[Array<URLResult>]> List of url results (one per url audited). Contains axe results.
root
<String> First URL that triggered the audit
URLResult
URLResult 包含一个 URL 的轴结果。
count
<Object>
total
<Number> Aggregated number of tests
pass
<Number> Aggregated number of successes
error
<Number> Aggregated number of errors
warning
<Number> Aggregated number of warnings
notice
<Number> Aggregated number of notices
numElements
<Number> Number of HTML elements in the page.
results
<[Array<[Violation]>]> List of violations (yes it contains only violations).
local
<String> Relative filepath to source code from public folder root. Default: ''
.
image
<String> Relative filepath to screen capture from public folder root. Default: ''
.
exception
<String> Error message if this url failed during audition.
Violation
违规表示特定节点上的 ax 错误。
code
<String> Error label.
type
<String>. Fixed to error
.
message
<String> Error message.
selector
<[Array<String>]> List of HTML node elements.
@a11ygato/audit-engine
Overview
This project is the engine responsible to audit a URL, a tree of URLs or a puppeteer script in a browser container.
Behind the scene, we are using axe-core to analyze pages. But we don't exclude the possibility to add other accessibility engines in the future.
There is one browser container: Chrome Headless. There was also Phantom (Webkit) before, but it has been abandonned.
This is the node API. There is a CLI named @a11ygato/cli
.
Axe is configured with these accessibility norms:
These rules are also deactivated:
- bypass
- video-caption
- audio-caption
- object-alt
- video-description
The browser container is installed and instantiated by this module automatically as a dependency.
Installation
npm i -S @a11ygato/audit-engine
Initialize the engine
You must first initialize once and for all the audit engine package:
const auditEngine = require('@a11ygato/audit-engine');
const settings = {...};
await auditEngine.init(settings);
Trigger an audit or a scenario
After that, you can trigger how many audits or scenarii you want.
An audit always have a root URL. You may enable crawling to find new URLs.
const auditEngine = require('@a11ygato/audit-engine');
const task = {url: 'https://...', depth, limit, ...};
const audit = auditEngine.createAudit(task);
const report = await audit.run();
Another way to analyze pages is to create a scenario. A scenario regroup one or more audits via scripting. A script is simply javascript code using the Puppeteer API with a few specificities.
const auditEngine = require('@a11ygato/audit-engine');
const task = {scenario: '...'};
const scenario = auditEngine.createScenario(task)
const report = await scenario.run();
You may catch
errors with a try/catch
statement but you will only receive uncaught errors. An audit or a scenario always complete and store eventually in the exception
field the cause of failure if needed.
API
auditEngine.init(settings)
Modify settings for the whole engine.
settings
<Object>
publicFolder
<String> Absolute or relative path to the folder that will store screen captures and source codes. Default: ./public
.
axeScript
<String> Absolute or relative path to the javascript axe file. If not provided, the axe-core
package will be searched in node_modules
.
includeRawAxeResults
<boolean> Whether raw results returned by axe should be included too? Default: false.
screenshotFilename
<String> Basename for the captured screenshot. Default: page.png
.
sourceFilename
<String> Basename for the captured source code. Default: source.html
.
concurrentInstances
<Number> Max number of concurrent browser instances. Default: 15
.
The maximum concurrent jobs for one audit being 5, we can have between 3 (each task having 5 concurrents jobs) and 15 tasks (each task having one concurrent job) in parallel.
proxy
<String> Proxy address. Default: null
.
- returns <Promise>
The engine manage a pool of connections per browser container.
Today, only Chrome is left, but before there was also Phantom.
There is a limit to the maximum number of connections you can have at the same time.
Even if you launch 20 audits with a concurrency of 5, when the max is reached, you will have to wait for a free connection.
This is obviously automatic but still good to know. It may explains why an audit take much more time than usual.
auditEngine.getSettings(settings)
Get engine settings.
auditEngine.createAudit(task)
task
<Object>
url
<String> Only mandatory argument. Root URL to audit from which we may start crawling.
limit
<Number> Maximum number of URL that may be audited independently of the depth. Default: 1. Max: 200.
depth
Define how many levels of crawling may be done. When a URL is audited, if depth
is > 0 and/or
limit
> 1, the engine will parse the source in search of new URL from the same domain and add them (protocol, hostname and port must match). Default: 5.
concurrency
<Number> Maximum number of URL audited in parallel. Default: 5. Max: 5.
timeout
<Number> Timeout (in ms) used in several places notably when running axe tests.
Be aware that some very large site might solicit axe for several minutes before getting the results back. Default: 90000.
urlFilter
<String> Regular expression content (without // or flags) that allows you to bypass the default crawling algorithm and decide precisely which urls are selected by the crawler.
- returns <Audit>
auditEngine.createScenario(task)
A scenario is javascript code (ES6) using the Puppeteer API.
You can use the modern async/await
keywords if you want and you DON'T need any IIEFor async wrapper function to start using it right away.
Whatever code style you like, your should always return a Promise
as your last expression:
return new Promise((resolve, reject) => {
// Do something
resolve(); // or reject()
});
// OR
return audit(...).then(() => {
return audit(...);
}).then(() => {
return audit(...);
});
// OR
await audit(...);
await audit(...);
// OR
return Promise.all(audit(...), audit(...), audit(...));
Examples
A simple example with BASIC authentication
/************************************************************************************/
// BASIC AUTH
/************************************************************************************/
const username = 'foo';
const password = 'bar';
const url = `https://httpbin.org/basic-auth/${username}/${password}`;
await page.goto(url);
await page.authenticate({ username, password });
// Trigger an audit using the current state (current page url).
await audit(page);
// This is another way to set HTTP headers but be aware they will be preserved between navigations.
// const headers = new Map();
// headers.set(
// 'Authorization',
// `Basic ${new Buffer(`${username}:${password}`).toString('base64')}`
// );
// await page.setExtraHTTPHeaders(headers);
A more advanced example
/************************************************************************************/
// FORM AUTH
/************************************************************************************/
const login = 'foo';
const password = 'bar';
const timeout = 90000;
await page.goto('http://www.orange.fr/portail');
// Click the customer space button
await page.waitForSelector('.espace-client-left', { visible: true, timeout });
await page.click('.espace-client-left');
// Click the identification button
await page.waitForSelector('#sc-identification .ec_button5', { visible: true, timeout });
await page.click('#sc-identification .ec_button5');
// Set username and password
await page.waitForSelector('#default_f_credential', { visible: true, timeout });
await page.$eval('#default_f_credential', el => el.value = '');
await page.focus('#default_f_credential');
await page.type(login);
await page.waitForSelector('#default_f_password', { visible: true, timeout });
await page.focus('#default_f_password');
await page.type(password);
// Submit form
await page.click('#AuthentForm input[type=submit].sc_button_content_2.submit');
await page.waitForNavigation({ timeout });
// Audit current page and another one (crawl).
await audit(page, { limit: 2, depth: 1 });
// Audit another site entirely with the same connection.
// URLs will be audited sequentially because there is only one connection.
await audit(page, { url: 'http://www.orange.es', limit:5 });
// Audit another site without reusing the same connection (will use a shared pool of connections).
// A maximum of 5 URLs can be audited in parallel.
await audit({ url: 'https://twitter.com', concurrency: 5, limit: 3, depth: 2 });
Globals
page:Page
page
is an instance of the Puppeteer Page
class. You can use it normally even though some methods are frozen for obvious security reasons.
You can see page
as a tab in Chrome.
audit(page, task)
page
<Page> Optional Puppeteer Page instance.
task
<Object> Task configuration:
url
<String> URL to audit
limit
<Number> Maximum number of URL that will be audited. Default 1
.
depth
<Number> Maximum depth when crawling (searching new urls). Default 5
.
concurrency
<Number> Maximum number of URL audited in parallel (between 1 and 5). Default 5
.
timeout
<Number> Timeout used during an audit for specific tasks like loading the page, executing axe tests, etc. Default 180000
.
urlFilter
<String> Regular expression content (without // or flags) that allows you to bypass the default crawling algorithm and decide precisely which urls are selected by the crawler.
When page
is omitted, you must provide at least this minimalist task configuration: { url:'https://...' }
.
Transmitting only the current page
is equivalent to transmitting a minimalist task configuration with only a URL (the current one).
Accessible modules
There is not much you can require. Here is the current whitelist:
You have access to classic ES6 globals plus node Buffer
class.
auditEngine.shutdown()
I encourage you to close properly the engine on shutdown. For instance:
process.once('SIGINT', function () {
console.log('Received SIGINT');
return gracefulShutdown(0);
});
////////
function gracefulShutdown(exitCode) {
// Destroying pools.
return auditEngine.shutdown().then(function() {
console.log('Pools drained');
process.exit(exitCode);
});
}
class: Audit
run
<function> Start auditing. Each url in error is tried two more times before being marked in error.
class: Scenario
Report
An audit or a scenario return a report with the following structure (which is essentially legacy).
count
<Object>
total
<Number> Aggregated number of tests
pass
<Number> Aggregated number of successes
error
<Number> Aggregated number of errors
warning
<Number> Aggregated number of warnings
notice
<Number> Aggregated number of notices
numElements
<Number> Aggregated number of HTML elements in all audited pages
urls
<[Array<URLResult>]> List of url results (one per url audited). Contains axe results.
root
<String> First URL that triggered the audit
URLResult
A URLResult contains axe results for one URL.
count
<Object>
total
<Number> Aggregated number of tests
pass
<Number> Aggregated number of successes
error
<Number> Aggregated number of errors
warning
<Number> Aggregated number of warnings
notice
<Number> Aggregated number of notices
numElements
<Number> Number of HTML elements in the page.
results
<[Array<[Violation]>]> List of violations (yes it contains only violations).
local
<String> Relative filepath to source code from public folder root. Default: ''
.
image
<String> Relative filepath to screen capture from public folder root. Default: ''
.
exception
<String> Error message if this url failed during audition.
Violation
A violation represent an axe error on a specific node.
code
<String> Error label.
type
<String>. Fixed to error
.
message
<String> Error message.
selector
<[Array<String>]> List of HTML node elements.