Huginn

约 8659 字大约 29 分钟...

Huginn 部署：参考 deploy Huginn inside of Docker 和 .env 设置用 Docker 部署，或按下方教程逐步将 Huginn 部署到服务器上。Docker 部署更简单，但容易报错。
Huginn 抓取教程：RSS 进阶篇：Huginn - 真·为任意网页定制 RSS 源（PhantomJs 抓取）
使用问题：
- 上游 rawdata 保存时间不用太长（比如 1d、7d），最终修剪好的数据的保存时间可以设置长些（比如 forever、180d）。
- DeDuplicationAgent 的 keep_events 是去重生成数据的保存时间。它与去重回溯时间无关，以下为去重原理：
  1. 对接收的每个事件，提取指定 property（或整个 payload），计算 CRC32 哈希值；
  2. 将哈希推入 Agent 的内存数组 memory['properties']；
  3. 对比此数组，若哈希未出现则重新发出事件，否则丢弃；
  4. lookback 控制数组长度，当不为 0 且数组长度达上限时，会先删除最旧条目；若 lookback = 0 则意味着“内存中的哈希数组永不裁剪”，用于跨任意长周期去重，不影响事件表中是否保留历史数据。keep_events_for 与此无关：它只控制事件表的保留期，与 memory['properties']（Agent 内存）不互斥。
  5. 提示：若希望在 Agent 重启或手动清空内存后仍保留哈希，可导出或备份 Agent 配置并开启版本控制；记忆一旦被 memory.clear，需要重新跑完去重范围内的事件来重建。
- LiquidOutputAgent 是不会生成 events 的，因此，keep_events_for 对它本身无实际作用。LiquidOutputAgent 在“Last X events”模式下，通过关联读取它连接的上游 Agent 创建并发送给它的事件，因此它上游的节点 keep_events_for 应当设置得足够长，以保证在 LiquidOutputAgent 读取时这些事件仍未被清理。
- DataOutputAgent 是一个将接收到的事件数据输出为 RSS 或 JSON 格式的 Agent。其 keep_events_for 配置项控制的是该 Agent 自己生成的事件在数据库中的保留时长，而非它接收的上游事件。

常用 Agent

Website Agent 解析网页、XML 文档和 json 数据，最常使用
Event Formatting Agent 事件信息格式化，可以对收到的信息内容进行格式化，允许添加自定义新内容
Phantom Js Cloud Agent 借助 Phantom 抓取动态页面源码，防止部门网站屏蔽爬虫
Trigger Agent 监控事件反馈信息的触发器，多用来过滤部分内容
De Duplicate Agent 去重
Data Output Agent 将数据以 RSS 和 Json 的形式向外部推送
Liquid Output Agent 自定义格式数据输出，可以用它创建 HTML 页面，json 数据等
Webhook Agent
Trigger Agent 监测敏感事件，然后可以用来发送邮件等提醒。
Javascript Agent 允许执行自定义的 JS 代码，可以用于个性化操作
Digest Agent 汇总节点，收集所有收到的事件再作为一个事件发送出去
Email Agent 用邮箱发送最新接收到的讯息
Post Agent 可以由其他节点触发，根据固定模板合并事件信息，并以 POST 或 GET 方式向指定的 URL 发起请求
Delay Agent 可以作为事件或者副本的暂存器或者缓冲区，统一触发发布
Scheduler Agent 定时器节点
Attribute Difference Agent 数值差异比较
Commander Agent 触发器代理，可以用于向其他节点发起指令控制，控制节点的执行和停止等

{{created_at}} 为自带抓取时间，Agent 设置中的特殊字符+，需要用反义符\\。

Huginn 部署

Huginn 的任务有时会卡住，导致后续任务无法进行，重启容器也无法恢复正常。因此，我改为手动部署 Huginn，并定期使用重置命令以防止任务卡住。

cd /home/huginn/huginn
sudo bundle exec rake production:force_stop
sudo bundle exec rake production:export

服务器重启后，需分行执行以下命令：

sudo docker exec -it huginn bash
sudo service mysql restart
sudo service mysql start
sudo service nginx restart
cd /home/huginn/huginn
git config --global --add safe.directory /home/huginn/huginn
sudo runsvdir /etc/service &
sudo bundle exec rake production:export

Huginn 经常用到的位置包括 /home/huginn/huginn（env 环境设置）和 /var/lib/mysql（数据库）。为了使这些位置能够在外部存储上工作，需要将外部存储位置的权限设置为 everyone，否则会出现错误。

需要注意的是，内部数据库默认情况下不会被外部识别。为了使其能够与外部进行连接，需要进行以下操作：

使用 sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf 命令找到 bind-address 行，并注释掉（在行的前面添加 #）：#bind-address = 127.0.0.1。同时，将 max_allowed_packet 设置为 200M。

根据连接反馈获取连接 IP 并授权，同时开放 process 权限，方便后期数据库备份。数据库备用可使用 backup_script.sh 脚本，定期将 sql 文件导出到外部存储。

mysql -u root -p
GRANT ALL PRIVILEGES ON *.* TO 'huginn'@'172.17.0.1' IDENTIFIED BY 'YourPassword';
GRANT PROCESS ON *.* TO 'huginn'@'localhost';
FLUSH PRIVILEGES;
\q
sudo service mysql restart

2 个疑问：

测试当任务卡住时，rake production:export 是否有效。（优化后，一直没出现卡住问题？）
部署时 production:export 步骤会提示 unable to lock supervise/lock: temporary failure，但此报错似乎不影响 Huginn 的运行，等有时间看看是否有相关报错。

Ubuntu 手动部署

部署环境：Ubuntu 18.04 的 Docker 镜像（同样适用于服务器）安装参考：Manual Installation on Debian/Ubuntu，Novice-setup-guide 手动升级：manual Update

Huginn 部署步骤：

# 进入 huginn 容器命令行，某些容器命令为 /bin/bash
sudo docker exec -it huginn bash
# run as root!
apt-get update -y
apt-get upgrade -y
apt-get install sudo -y

# Install vim and set as default editor
sudo apt-get install -y vim
sudo update-alternatives --set editor /usr/bin/vim.basic

# Install the required packages
sudo apt-get install -y runit build-essential git zlib1g-dev libyaml-dev libssl-dev libgdbm-dev libreadline-dev libncurses5-dev libffi-dev curl openssh-server checkinstall libxml2-dev libxslt-dev libcurl4-openssl-dev libicu-dev logrotate pkg-config cmake nodejs graphviz jq shared-mime-info

# Ubuntu 18.04 Bionic
sudo apt-get install -y runit-systemd

# Download Ruby and compile it:
mkdir /tmp/ruby && cd /tmp/ruby
curl -L --progress-bar https://cache.ruby-lang.org/pub/ruby/3.2/ruby-3.2.6.tar.xz | tar xJ
cd ruby-3.2.6
./configure --disable-install-rdoc
make -j`nproc`
sudo make install

sudo gem update --system --no-document
sudo gem install foreman --no-document

# Create a user for Huginn:
sudo adduser --disabled-login --gecos 'Huginn' huginn

# Install the database packages
sudo apt-get install -y mysql-server mysql-client libmysqlclient-dev

输入 service mysql start 启动数据库，否则下一步数据库设置容易报错 Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'。^[1]

# 逐步设置数据库 root 密码
sudo mysql_secure_installation

# 用上方设置的密码登陆数据库
mysql -u root -p

# ⚠️逐行输入代码到数据库命令行 `mysql>`，需将 `$password` 替换为你要设置的密码
CREATE USER 'huginn'@'localhost' IDENTIFIED BY '$password';
SET default_storage_engine=INNODB;
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, LOCK TABLES ON `huginn_production`.* TO 'huginn'@'localhost';
FLUSH PRIVILEGES;
\q

数据库设置好后，拉取 huginn 主体程序，此段命令可以整段复制到 ssh。

# We'll install Huginn into the home directory of the user "huginn"
cd /home/huginn

# Clone Huginn repository
sudo -u huginn -H git clone https://github.com/huginn/huginn.git -b master huginn

# 如果有ruby 3.2等问题，可指定其他分支
# sudo -u huginn -H git clone https://github.com/huginn/huginn.git -b latest_rubygems huginn

# Go to Huginn installation folder
cd /home/huginn/huginn

# Copy the example Huginn config
sudo -u huginn -H cp .env.example .env

# Create the log/, tmp/pids/ and tmp/sockets/ directories
sudo -u huginn mkdir -p log tmp/pids tmp/sockets

# Make sure Huginn can write to the log/ and tmp/ directories
sudo chown -R huginn log/ tmp/
sudo chmod -R u+rwX,go-w log/ tmp/

# Make sure permissions are set correctly
sudo chmod -R u+rwX,go-w log/
sudo chmod -R u+rwX tmp/
sudo -u huginn -H chmod o-rwx .env

# Copy the example Unicorn config
sudo -u huginn -H cp config/unicorn.rb.example config/unicorn.rb

sudo -u huginn -H editor .env 设置 huginn 环境依赖，更多选项查看 .env 设置案例。编辑器为上面安装的 vim，i 在光标所在的位置插入，esc 退出编辑，:wq 保存并退出。

DATABASE_ADAPTER=mysql2
#DATABASE_ENCODING=utf8   # 修改点
DATABASE_RECONNECT=true
DATABASE_NAME=huginn_production  # 修改点
DATABASE_POOL=20
DATABASE_USERNAME=huginn   # 修改点
DATABASE_PASSWORD='$password' # 修改点，换为你自己的密码
#DATABASE_HOST=your-domain-here.com
#DATABASE_PORT=3306
#DATABASE_SOCKET=/tmp/mysql.sock

# MySQL only: If you are running a MySQL server >=5.5.3, you should
# set DATABASE_ENCODING to utf8mb4 instead of utf8 so that the
# database can hold 4-byte UTF-8 characters like emoji.
DATABASE_ENCODING=utf8mb4  #修改点

...
RAILS_ENV=production  # 修改点

USE_GRAPHVIZ_DOT=dot # 取消注释，启用 GRAPHVIZ 来生成 diagram

TIMEZONE="Beijing" # bundle exec rake time:zones:local，时区需按指定格式填写，否则会报错 runsv not running

DEFAULT_HTTP_USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" # 浏览器访问

# 邮件发送设置
SMTP_DOMAIN=newzone.top
SMTP_USER_NAME=benson@newzone.top
SMTP_PASSWORD=somepassword
SMTP_SERVER=smtp.feishu.cn
SMTP_PORT=465
SMTP_AUTHENTICATION=plain
SMTP_ENABLE_STARTTLS_AUTO=true
SMTP_SSL=true
SEND_EMAIL_IN_DEVELOPMENT=true

# Maximum runtime of background jobs in minutes
# 默认为2分钟，不过如果你数据较多，有可能需要调整此项
DELAYED_JOB_MAX_RUNTIME=10

Install Gems 前用子账户重新设置运行目录权限 sudo chown -R huginn:huginn /home/huginn，防止报错 Your user account isn't allowed to install to the system RubyGems。

# 注意看黄字警告
gem install bundler
# Docker 环境中，时区容易丢失(6-70)
apt-get install tzdata
# Install Gems
sudo -u huginn -H bundle config set deployment 'true'
sudo -u huginn -H bundle config set without 'development test'
sudo -u huginn -H bundle install
# 备用 Gems 修复命令
# bundle update
# gem update bundler
# vim /home/huginn/huginn/Gemfile

# Initialize Database
# Create the database
sudo -u huginn -H bundle exec rake db:create RAILS_ENV=production

# Migrate to the latest version
sudo -u huginn -H bundle exec rake db:migrate RAILS_ENV=production

# ⚠️设置登陆账户密码，Create admin user and example agents using the default admin/password login
sudo -u huginn -H bundle exec rake db:seed RAILS_ENV=production SEED_USERNAME=admin SEED_PASSWORD=password

# Compile Assets
sudo -u huginn -H bundle exec rake assets:precompile RAILS_ENV=production

sudo -u huginn -H editor Procfile 修改 huginn 设置。如果需多现成运行，可移除 Multiple DelayedJob workers 部分的注释。

# 在下两行前，添加符号「#」
#web: bundle exec rails server -p ${PORT-3000} -b ${IP-0.0.0.0}
#jobs: bundle exec rails runner bin/threaded.rb

# 删除以下下两行前的符号「#」
web: bundle exec unicorn -c config/unicorn.rb
jobs: bundle exec rails runner bin/threaded.rb

'sv stop huginn-web-1' exited with a non-zero return value: fail: huginn-web-1: runsv not running 的报错，使用 foreman export runit -a huginn -l /home/huginn/huginn/log /etc/service 和 chown -R huginn:huginn /etc/service/huginn*。^[2] ^[3] 如果是重启 Huginn 时出现此报错，则检查 sudo -u huginn -H editor .env 设置。

# 切换到
cd /home/huginn/huginn
# 设置
git config --global --add safe.directory /home/huginn/huginn
# 设置开机启动
sudo runsvdir /etc/service &
sudo bundle exec rake production:export

# Setup Logrotate
sudo cp deployment/logrotate/huginn /etc/logrotate.d/huginn

# Ensure Your Huginn Instance Is Running
sudo bundle exec rake production:status

Nginx 站点设置：

sudo apt-get install -y nginx

# Site Configuration
sudo cp deployment/nginx/huginn /etc/nginx/sites-available/huginn
sudo ln -s /etc/nginx/sites-available/huginn /etc/nginx/sites-enabled/huginn

# Change YOUR_SERVER_FQDN to the fully-qualified domain name of your host serving Huginn.
sudo editor /etc/nginx/sites-available/huginn

# 不需要 https，则改为下方配置
server {
  listen 80; # 监听的端⼝
  server_name localhost home.newzone.top; # 域名或ip，这里启用了两个地址，用空格分开

# 测试设置是否正确
sudo nginx -t

# 移除默认网站设置，只有当服务器/容器只存在 Huginn 网站方执行下行命令
sudo rm /etc/nginx/sites-enabled/default

以上完成了 Huginn 的所有部署，执行 sudo service nginx restart 即可访问网站。

Huginn Docker

Huginn multi-process 镜像基于 Ubuntu 18.04，没有 root 权限。如果不导出卷，或者使用单独的数据库容器，则无法在不丢失数据的情况下更新 Huginn。可以手动设置数据库对外端口和外部存储路径。

此外，官方镜像路径与手动版不同，不支持 force_stop 命令。官方建议 Docker 中使用下方命令删除数据库中卡住的任务。这个命令实测是有效的，但我有次碰到了未知 bug，卡住的任务被删除，后续任务却没继续。

# get a shell inside the docker container (replace huginn with the name or id of the container)
sudo docker exec -it huginn /bin/bash

# source the environment file
source .env

# get a rails console
bundle exec rails console

# inside the rails console delete  the job
Delayed::Job.where('locked_at IS NOT NULL AND locked_by IS NOT NULL AND failed_at IS NULL').destroy_all

Agents

Trigger Agent

Trigger Agent 挑选符合条件的事件。

# content 字段中不包含 周雅萌 或 邓雅萌
{
  "expected_receive_period_in_days": "2",
  "keep_event": "true",
  "rules": [
    {
      "type": "!regex",
      "value": "周雅萌 | 邓雅萌",
      "path": "$.content"
    }
  ],
  "message": "Looks like your pattern matched in '{{value}}'!"
}

# title 中包含品牌词 iluminage 或 易美肌
{
  "expected_receive_period_in_days": "4",
  "keep_event": "true",
  "rules": [
    {
      "type": "regex",
      "value": "iluminage|易美肌",
      "path": "$.title"
    }
  ],
  "message": "Looks like your pattern matched in '{{value}}'!"
}

Liquid Output Agent

用自定义模板将数据整理，输出为 HTML，json 和 xml 格式链接。

模式一般选 Last X events，将接收到的所有数据对外输出，默认为 1000。

Last X events 模式下，可以设置 Event limit 以控制输出数据的数量和时间段。Event limit 可以设为 100，即输出数据为 100；也可以设为「1 day」或「5 minutes」，即仅输出最近一天的内容。

Event Formatting Agent

Event Formatting Agent 允许您格式化传入的事件，根据需要添加新的字段。可以用正则来替换输入中的某些元素。具体样例参考，新京报 #5 清理版面字段格式。

# strftime() 方法中常用的占位符
# %Y 表示年份，%m 表示月份，%d 表示日期，%H 表示小时（24小时制），%M 表示分钟，%S 表示秒，%B 代表英文的月份，`%I` 代表小时（12小时制），`%p` 代表 AM/PM。`%e` 代表日期，不会在首位添加零。
"created_at": "{{created_at | date:'%Y-%m-%d'}}"

# 将 2023-03-02 23:33:30 +0800 替换为 2023-03-02
"created_at": "{{created_at | regex_replace: ' ', ''| regex_replace: '(([0-1]?[0-9])|([2][0-3])):([0-5]?[0-9])(:([0-5]?[0-9]))?', ''| regex_replace: '\\+0800', ''}}"

正则重构

比如生成时间规则为 "created_at": "{{created_at}}"，默认时间 2022-07-06 21:09:51 +0800，使用正则删除规则为 "created_at": "{{created_at | regex_replace: ' ', ''| regex_replace: '(([0-1]?[0-9])|([2][0-3])):([0-5]?[0-9])(:([0-5]?[0-9]))?', ''| regex_replace: '\\+0800', ''}}"。

加前后缀

抓取链接不完整时，需要完善链接，比如 "url_link": "https://so.toutiao.com{{temp_link}}"。

For example, here is a possible Event:

{
  "high": { "celsius": "18", "fahreinheit": "64" },
  "date":
    { "epoch": "1357959600", "pretty": "10:00 PM EST on January 11, 2013" },
  "conditions": "Rain showers",
  "data": "This is some data",
}

You may want to send this event to another Agent, for example a Twilio Agent, which expects a message key. You can use an Event Formatting Agent's instructions setting to do this in the following way:

"instructions": {
  "message": "Today's conditions look like {{conditions}} with a high temperature of {{high.celsius}} degrees Celsius.",
  "subject": "{{data}}",
  "created_at": "{{created_at}}"
}

Names here like conditions, high and data refer to the corresponding values in the Event hash.

The special key created_at refers to the timestamp of the Event, which can be reformatted by the date filter, like {{created_at | date:"at %I:%M %p" }}.

The upstream agent of each received event is accessible via the key agent, which has the following attributes: name, options, sources, type, url, id, disabled, memory, controllers, schedule, keep_events_for, propagate_immediately, working, receivers, control_targets.

Have a look at the Wiki to learn more about liquid templating.

Events generated by this possible Event Formatting Agent will look like:

{
  "message": "Today's conditions look like Rain showers with a high temperature of 18 degrees Celsius.",
  "subject": "This is some data"
}

In matchers setting you can perform regular expression matching against contents of events and expand the match data for use in instructions setting. Here is an example:

{
  "matchers": [
    {
      "path": "{{date.pretty}}",
      "regexp": "A(?<time>dd:dd [AP]M [A-Z]+)",
      "to": "pretty_date"
    }
  ]
}

This virtually merges the following hash into the original event hash:

"pretty_date": {
  "time": "10:00 PM EST",
  "0": "10:00 PM EST on January 11, 2013"
  "1": "10:00 PM EST"
}

So you can use it in instructions like this:

"instructions": {
  "message": "Today's conditions look like {{conditions}} with a high temperature of {{high.celsius}} degrees Celsius according to the forecast at {{pretty_date.time}}.",
  "subject": "{{data}}"
}

If you want to retain original contents of events and only add new keys, then set mode to merge, otherwise set it to clean.

To CGI escape output (for example when creating a link), use the Liquid uri_escape filter, like so:

{
  "message": "A peak was on Twitter in {{group_by}}.  Search: https://twitter.com/search?q={{group_by | uri_escape}}"
}

Adioso Agent - 机票价格追踪

Creates events

Adioso Agent 可以查询两个城市间，在指定时间内的最低飞机票价格。机票价格货币是美元。查询日期 start_date 和 end_date 之间的差异需小于 150 天。需要注册 Adioso，并在 username and password 中输入。

Aftership Agent - 物流追踪-API 付费

Creates events

Aftership agent 帮助你追踪你的快递，并实时更新包裹动态。为了能够使用 Aftership API，您需要生成一个 API Key。这需要付费才能使用其跟踪功能。

操作说明： Provide the path for the API endpoint that you’d like to hit. For example, for all active packages, enter trackings (see https://www.aftership.com/docs/api/4/trackings), for a specific package, use trackings/SLUG/TRACKING_NUMBER and replace SLUG with a courier code and TRACKING_NUMBER with the tracking number. You can request last checkpoint of a package by providing last_checkpoint/SLUG/TRACKING_NUMBER instead.

You can get a list of courier information here https://www.aftership.com/courier

Required Options:

api_key - YOUR_API_KEY.
path request and its full path

Algorithmia Agent - AI 算法

Creates events Receives events Dry runshuginn_algorithmia_agent

AlgoritmiaAgent 运行 Algorithmia 中的算法。在使用此代理之前，您需要注册一个Algorithmia帐户。

The created event will have the output from Algorithmia in the result key. To merge incoming Events with the result, use merge for the mode key.

Attribute Difference Agent - 属性差异（待深入理解）

Creates events Receives events

The Attribute Difference Agent receives events and emits a new event with the difference or change of a specific attribute in comparison to the previous event received. Attribute Difference Agent 用于传递两个不同值的差异和改变。

path specifies the JSON path of the attribute to be used from the event.

output specifies the new attribute name that will be created on the original payload and it will contain the difference or change.

method specifies if it should be…

percentage_change eg. Previous value was 160, new value is 116. Percentage change is -27.5
decimal_difference eg. Previous value was 5.5, new value is 15.2. Difference is 9.7
integer_difference eg. Previous value was 50, new value is 40. Difference is -10

decimal_precision defaults to 3, but you can override this if you want.

expected_update_period_in_days is used to determine if the Agent is working.

The resulting event will be a copy of the received event with the difference or change added as an extra attribute. If you use the percentage_change the attribute will be formatted as such {{attribute}}_change, otherwise it will be {{attribute}}_diff.

All configuration options will be liquid interpolated based on the incoming event.

Basecamp Agent - Service 停用

Creates events 37signals

The Basecamp Agent checks a Basecamp project for new Events

To be able to use this Agent you need to authenticate with 37signals in the Services section first.

Bigquery Agent - 大型数据库分析

Creates events Receives events Dry runshuginn_bigquery_agent

Bigquery Agent 会调用 Google BigQuery 和 Goolge Cloud 账户，可能需要付费。同时，Bigquery Agent 依靠服务帐户进行身份验证，而不是 oauth。

Setup:

Visit the google api console
Use your existing project (or create a new one)
APIs & Auth -> Enable BigQuery
Credentials -> Create new Client ID -> Service Account
Download the JSON keyfile and either save it to a path, ie: /home/huginn/Huginn-5d12345678cd.json, or copy the value of private_key from the file.
Grant that service account access to the BigQuery datasets and tables you want to query.

The JSON keyfile you downloaded earlier should look like this:

{
  "type": "service_account",
  "project_id": "huginn-123123",
  "private_key_id": "6d6b476fc6ccdb31e0f171991e5528bb396ffbe4",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "huginn@huginn-123123.iam.gserviceaccount.com",
  "client_id": "123123...123123",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/huginn%40huginn-123123.iam.gserviceaccount.com"
}

Agent Configuration:

project_id - The id of the Google Cloud project.

query - The BigQuery query to run. Liquid formatting is supported to run queries based on receiving events.

use_legacy - Whether or not to use BigQuery legacy SQL or standard SQL. (Defaults to false)

max - Maximum number of rows to return. Defaults to unlimited, but results are always limited to 10 MB.

timeout - How long to wait for query to complete (in ms). Defaults to 10000.

event_per_row - Whether to create one Event per row returned, or one event with all rows as results. Defaults to false.

Authorization

keyfile - (String) The path (relative to where Huginn is running) to the JSON keyfile downloaded in step 5 above.

Alternately, keyfile can be a hash:

keyfile private_key - The private key (value of private_key from the downloaded file). Liquid formatting is supported if you want to use a Credential. (E.g., {% credential google-bigquery-key %})

keyfile client_email - The value of client_email from the downloaded file. Same as the service account email.

Boxcar Agent - iPhone 通知栏 app

Receives events

Boxcar agent 会在 iPhone 推送通知，但其不兼容于 iOS 10 系统，已经停止更新。

To be able to use the Boxcar end-user API, you need your Access Token. The access token is available on general “Settings” screen of Boxcar iOS app or from Boxcar Web Inbox settings page.

Please provide your access token in the user_credentials option. If you'd like to use a credential, set the user_credentials option to {% credential CREDENTIAL_NAME %}.

Options:

user_credentials - Boxcar access token.
title - Title of the message.
body - Body of the message.
source_name - Name of the source of the message. Set to Huginn by default.
icon_url - URL to the icon.
sound - Sound to be played for the notification. Set to ‘bird-1’ by default.

Change Detector Agent - 监测数据变化

Creates events Receives events

The Change Detector Agent receives a stream of events and emits a new event when a property of the received event changes. Change Detector Agent 会接收数据流内容，并在监测到数据属性改变后，传递出新事件。

property specifies a Liquid template that expands to the property to be watched, where you can use a variable last_property for the last property value. If you want to detect a new lowest price, try this: {% assign drop = last_property | minus: price %}{% if last_property == blank or drop > 0 %}{{ price | default: last_property }}{% else %}{{ last_property }}{% endif %}

expected_update_period_in_days is used to determine if the Agent is working.

The resulting event will be a copy of the received event.

Commander Agent - 触发命令

Receives events Controls agents

Commander Agent 由时间表或传入事件触发，并触发其他 agents 运行，禁用，配置或启用自己。

Action types

Set action to one of the action types below:

run: Target Agents are run when this agent is triggered.
disable: Target Agents are disabled (if not) when this agent is triggered.
enable: Target Agents are enabled (if not) when this agent is triggered.
configure: Target Agents have their options updated with the contents of configure_options.

Here's a tip: you can use Liquid templating to dynamically determine the action type. For example:

To create a CommanderAgent that receives an event from a WeatherAgent every morning to kick an agent flow that is only useful in a nice weather, try this: {% if conditions contains 'Sunny' or conditions contains 'Cloudy' %} run{% endif %}
Likewise, if you have a scheduled agent flow specially crafted for rainy days, try this: {% if conditions contains 'Rain' %}enable{% else %}disabled{% endif %}
If you want to update a WeatherAgent based on a UserLocationAgent, you could use 'action': 'configure' and set 'configure*options' to { 'location': '{{\_location*.latlng}}' }.
In templating, you can use the variable target to refer to each target agent, which has the following attributes: name, options, sources, type, url, id, disabled, memory, controllers, schedule, keep_events_for, propagate_immediately, working, receivers, and control_targets.

Targets

Select Agents that you want to control from this CommanderAgent.

Csv Agent - CSV 表格数据处理

Creates events Receives events Consumes file pointer

CsvAgent 可以解析或重构 CSV 表格数据。解析时，可以针对整个 CSV，也可以单独处理一行数据。

Set mode to parse to parse CSV from incoming event, when set to serialize the agent serilizes the data of events to CSV.

Universal options

Specify the separator which is used to seperate the fields from each other (default is ,).

data_key sets the key which contains the serialized CSV or parsed CSV data in emitted events.

Parsing

If use_fields is set to a comma seperated string and the CSV file contains field headers the agent will only extract the specified fields.

output determines wheather one event per row is emitted or one event that includes all the rows.

Set with_header to true if first line of the CSV file are field names.

This agent can consume a ‘file pointer’ event from the following agents with no additional configuration: FtpsiteAgent, LocalFileAgent, S3Agent. Read more about the concept in the wiki.

When receiving the CSV data in a regular event use JSONPath to select the path in data_path. data_path is only used when the received event does not contain a 'file pointer'.

Serializing

If use_fields is set to a comma seperated string and the first received event has a object at the specified data_path the generated CSV will only include the given fields.

Set with_header to true to include a field header in the CSV.

Use JSONPath in data_path to select with part of the received events should be serialized.

Data Output Agent - 网页输出数据（RSS）

Receives events

Data Output Agent 将传入的数据输入为 RSS 或 JSON，具体格式为“.xml”或“.json”。

This Agent will output data at:

https:///users/1/web_requests/:id/:secret.xml

where :secret is one of the allowed secrets specified in your options and the extension can be xml or json.

You can setup multiple secrets so that you can individually authorize external systems to access your Huginn data.

Options:

secrets - An array of tokens that the requestor must provide for light-weight authentication.
expected_receive_period_in_days - How often you expect data to be received by this Agent from other Agents.
template - A JSON object representing a mapping between item output keys and incoming event values. Use Liquid to format the values. Values of the link, title, description and icon keys will be put into the <channel> section of RSS output. Value of the self key will be used as URL for this feed itself, which is useful when you serve it via reverse proxy. The item key will be repeated for every Event. The pubDate key for each item will have the creation time of the Event unless given.
events_to_show - The number of events to output in RSS or JSON. (default: 40)
ttl - A value for the <ttl> element in RSS output. (default: 60)
ns_media - Add yahoo media namespace in output xml
ns_itunes - Add itunes compatible namespace in output xml
rss_content_type - Content-Type for RSS output (default: application/rss+xml)
response_headers - An object with any custom response headers. (example: {"Access-Control-Allow-Origin": "*"})
push_hubs - Set to a list of PubSubHubbub endpoints you want to publish an update to every time this agent receives an event. (default: none) Popular hubs include Superfeedr and Google. Note that publishing updates will make your feed URL known to the public, so if you want to keep it secret, set up a reverse proxy to serve your feed via a safe URL and specify it in template.self.

If you'd like to output RSS tags with attributes, such as enclosure, use something like the following in your template:

"enclosure": {
  "_attributes": {
    "url": "{{media_url}}",
    "length": "1234456789",
    "type": "audio/mpeg"
  }
},
"another_tag": {
  "_attributes": {
    "key": "value",
    "another_key": "another_value"
  },
  "_contents": "tag contents (can be an object for nesting)"
}

Ordering events

To specify the order of events, set events_order to an array of sort keys, each of which looks like either expression or [expression, type, descending], as described as follows:

expression is a Liquid template to generate a string to be used as sort key.
type (optional) is one of string (default), number and time, which specifies how to evaluate expression for comparison.
descending (optional) is a boolean value to determine if comparison should be done in descending (reverse) order, which defaults to false.

Sort keys listed earlier take precedence over ones listed later. For example, if you want to sort articles by the date and then by the author, specify [["{{date}}", "time"], "{{author}}"].

Sorting is done stably, so even if all events have the same set of sort key values the original order is retained. Also, a special Liquid variable _index_ is provided, which contains the zero-based index number of each event, which means you can exactly reverse the order of events by specifying [["{{_index_}}", "number", true]].

DataOutputAgent will select the last events_to_show entries of its received events sorted in the order specified by events_order, which is defaulted to the event creation time. So, if you have multiple source agents that may create many events in a run, you may want to either increase events_to_show to have a larger "window", or specify the events_order option to an appropriate value (like date_published) so events from various sources are properly mixed in the resulted feed.

There is also an option events_list_order that only controls the order of events listed in the final output, without attempting to maintain a total order of received events. It has the same format as events_order and is defaulted to [["{{_index_}}","number",true]] so the selected events are listed in reverse order like most popular RSS feeds list their articles.

Liquid Templating

In Liquid templating, the following variable is available:

events: An array of events being output, sorted in the given order, up to events_to_show in number. For example, if source events contain a site title in the site_title key, you can refer to it in template.title by putting {{events.first.site_title}}.

De Duplication Agent - 数据去重

Creates events Receives events

De-duplication Agent 在接受数据后，会自动比对并去除重复数据。

property the value that should be used to determine the uniqueness of the event (empty to use the whole payload)

lookback amount of past Events to compare the value to (0 for unlimited)

expected_update_period_in_days is used to determine if the Agent is working.

Delay Agent - 缓冲存储区

Creates events Receives events

Delay Agent 存储收到的事件，并按计划发送它们的副本。将其用作事件的缓冲区或队列。

max_events should be set to the maximum number of events that you'd like to hold in the buffer. When this number is reached, new events will either be ignored, or will displace the oldest event already in the buffer, depending on whether you set keep to newest or oldest.

expected_receive_period_in_days is used to determine if the Agent is working. Set it to the maximum number of days that you anticipate passing without this Agent receiving an incoming Event.

max_emitted_events is used to limit the number of the maximum events which should be created. If you omit this DelayAgent will create events for every event stored in the memory.

Digest Agent - 摘要汇总 - 未理解

Creates events Receives events

Digest Agent 收集发送给它的任何事件并将其作为单个事件发出。

The resulting Event will have a payload message of message. You can use liquid templating in the message, have a look at the Wiki for details.

Set expected_receive_period_in_days to the maximum amount of time that you'd expect to pass between Events being received by this Agent.

If retained_events is set to 0 (the default), all received events are cleared after a digest is sent. Set retained_events to a value larger than 0 to keep a certain number of events around on a rolling basis to re-send in future digests.

For instance, say retained_events is set to 3 and the Agent has received Events 5, 4, and 3. When a digest is sent, Events 5, 4, and 3 are retained for a future digest. After Event 6 is received, the next digest will contain Events 6, 5, and 4.

Dkt Clustering Agent - 数据挖掘算法？

Creates events Receives events Consumes file pointer Dry runshuginn_dkt_curation_agents 使用 DKT APIs，其中含有多个 agents，具体查看上方链接。

The DktClusteringAgent clusters the input document collection. The document collection first has to be converted to a set of vectors. DktClusteringAgent 会对输入文档集合进行聚类，文档集合首先必须被转换成一组向量。输入文档需要为特定格式，DktClusteringAgent 会在这个输入数据中查找集群。输入会包含查找到的集群数量和找到的集群特定值的信息。

The Agent expects the input in this particular format and then proceeds to find clusters in this input data. The output contains information on the number of clusters found and specific values for the found clusters.

The Agent accepts all configuration options of the /e-clustering/generateClusters endpoint as of september 2016, have a look at the offical documentation if you need additional information

All Agent configuration options are interpolated using Liquid in the context of the received event.

url allows to customize the endpoint of the API when hosting the DKT services elswhere.

body use Liquid templating to specify the input .arff file. See http://www.cs.waikato.ac.nz/ml/weka/arff.html for an explanation of this format.

language language of the source data

algorithm: the algorithm to be used during clustering. Currently EM and Kmeans are supported.

merge set to true to retain the received payload and update it with the extracted result

result_key when present the emitted Event data will be nested inside the specified key

When receiving a file pointer:

body will be ignored and the contents of the received file will be send instead.

This agent can consume a ‘file pointer’ event from the following agents with no additional configuration: FtpsiteAgent, LocalFileAgent, S3Agent. Read more about the concept in the wiki.

Dropbox File Url

Agent Creates events Receives events Dry runs Dropbox oauth2

DropboxFileUrlAgent 用于使用 Dropbox。它需要一个文件路径（或多个文件路径），并通过临时链接或永久链接发送事件。

Include the dropbox-api and omniauth-dropbox gems in your Gemfile and set DROPBOX_OAUTH_KEY and DROPBOX_OAUTH_SECRET in your environment to use Dropbox Agents.

The incoming event payload needs to have a paths key, with a comma-separated list of files you want the URL for. For example:

{
  "paths": "first/path, second/path"
}

TIP: You can use the Event Formatting Agent to format events before they come in. Here’s an example configuration for formatting an event coming out of a Dropbox Watch Agent:

{
  "instructions": {
    "paths": "{{ added | map: 'path' | join: ',' }}"
  },
  "matchers": [],
  "mode": "clean"
}

An example of usage would be to watch a specific Dropbox directory (with the DropboxWatchAgent) and get the URLs for the added or updated files. You could then, for example, send emails with those links.

Set link_type to 'temporary' if you want temporary links, or to 'permanent' for permanent ones.

Dropbox Watch Agent

Creates events Dropbox oauth2

Dropbox Watch Agent 监测给定值dir_to_watch并发出检测到的更改的事件。

Include the dropbox-api and omniauth-dropbox gems in your Gemfile and set DROPBOX_OAUTH_KEY and DROPBOX_OAUTH_SECRET in your environment to use Dropbox Agents.

Email Agent

Email Agent 将刚收到的信息 Receives events，以邮件形式发送通知。

你可以通过提供subject选项来指定邮件的主题行，该选项可以包含Liquid的格式。例如，你可以提供"Huginn email"来设置一个简单的主题，或者{{subject}}来使用传入事件中的subject键。

默认情况下，邮件正文将包含一个可选的 "标题"，然后是事件的键的列表。

你可以通过加入可选的body参数来定制邮件正文。像subject一样，body可以是一个简单的信息或一个液体模板。你可以只发送事件的some_text字段，body设置为{{ some_text }}。身体可以包含简单的 HTML，并将被净化。注意，当使用body时，它将被<html>和<body>标签包裹，所以你不需要自己添加这些。

你可以为邮件指定一个或多个 "收件人"，或者跳过这个选项，以便将邮件发送到你的账户的默认邮件地址。

你可以为邮件提供一个from地址，或者留空，默认为EMAIL_FROM_ADDRESS (``) 的值。

你可以为邮件提供一个content_type，并指定发送text/plain或text/html。如果你不指定content_type，那么收件人的邮件服务器将决定正确的渲染方式。

设置expected_receive_period_in_days为您希望该代理收到事件之间的最大时间。

Email Digest Agent - 邮件摘要

Receives events

Email Digest Agent 收集发送给它的任何事件，并按计划通过电子邮件发送。使用事件的数目与 Keep events 有关，这意味着如果事件在 Email Digest Agent 计划运行之前到期，它们将不会出现在电子邮件中。

By default, the will have a subject and an optional headline before listing the Events. If the Events' payloads contain a message, that will be highlighted, otherwise everything in their payloads will be shown.

You can specify one or more recipients for the email, or skip the option in order to send the email to your account's default email address.

You can provide a from address for the email, or leave it blank to default to the value of EMAIL_FROM_ADDRESS (``).

You can provide a content_type for the email and specify text/plain or text/html to be sent. If you do not specify content_type, then the recipient email server will determine the correct rendering.

Set expected_receive_period_in_days to the maximum amount of time that you'd expect to pass between Events being received by this Agent.

Evernote Agent

Creates events Receives events Evernote

Evernote Agent 与你的 Evernote 账户相连，新建笔记。

Visit Evernote to set up an Evernote app and receive an api key and secret. Store these in the Evernote environment variables in the .env file. You will also need to create a Sandbox account to use during development.

Next, you'll need to authenticate with Evernote in the Services section.

Options:

mode - Two possible values:
- update Based on events it receives, the agent will create notes or update notes with the same title and notebook
- read On a schedule, it will generate events containing data for newly added or updated notes
include_xhtml_content - Set to true to include the content in ENML (Evernote Markup Language) of the note
note
- When mode is update the parameters of note are the attributes of the note to be added/edited. To edit a note, both title and notebook must be set.
  For example, to add the tags 'comic' and 'CS' to a note titled 'xkcd Survey' in the notebook 'xkcd', use:
```
"notes": {
  "title": "xkcd Survey",
  "content": "",
  "notebook": "xkcd",
  "tagNames": "comic, CS"
}
```
  If a note with the above title and notebook did note exist already, one would be created.
- When mode is read the values are search parameters. Note: The content parameter is not used for searching. Setting title only filters notes whose titles contain title as a substring, not as the exact title.
  For example, to find all notes with tag 'CS' in the notebook 'xkcd', use:
```
"notes": {
  "title": "",
  "content": "",
  "notebook": "xkcd",
  "tagNames": "CS"
}
```

Freme Explore Agent - SPARQL-数据断点？

Creates events Receives events Dry runs huginn_freme_enrichment_agents 使用 FREME APIs，其中含有多个 agents，具体查看上方链接。

The FremeExploreAgent can retrieve description of a resource from a given endpoint. The endpoint can be SPARQL or Linked Data Fragments endpoint. FremeExploreAgent 可以检索给定端点的资源描述。端点可以是 SPARQL 或 Linked Data Fragments 端点。

The Agent accepts all configuration options of the /e-link/explore endpoint as of September 2016, have a look at the offical documentation if you need additional information.

All Agent configuration options are interpolated using Liquid in the context of the received event.

base_url allows to customize the API server when hosting the FREME services elswhere.

auth_token can be set to access private filters, datasets, templates or pipelines (depending on the agent).

outformat requested RDF serialization format of the output (required), CSV is only supported when using a filter.

resource a URI of the resource which should be described (required).

endpoint a URL of the endpoint which should be used to retrieve info about the resource.

endpoint_type the type of the endpoint (required).

filter allows to post-process the results using a pre-configured SPARQL filter. Check the official documentation for details.

merge set to true to retain the received payload and update it with the extracted result

result_key when present the emitted Event data will be nested inside the specified key

Ftpsite Agent

Creates events Receives events Emits file pointer

Ftp Site Agent 检查 FTP 站点，并根据目录中新上载的文件创建事件。当接收到事件时，它会在配置的 FTP 服务器上创建文件。

mode must be present and either read or write, in read mode the agent checks the FTP site for changed files, with write it writes received events to a file on the server.

Universal options

Specify a url that represents a directory of an FTP site to watch, and a list of patterns to match against file names.

Login credentials can be included in url if authentication is required: ftp://username:password@ftp.example.com/path. Liquid formatting is supported as well: ftp://{% credential ftp_credentials %}@ftp.example.com/

Optionally specify the encoding of the files you want to read/write in force_encoding, by default UTF-8 is used.

Reading

Only files with a last modification time later than the after value, if specifed, are emitted as event.

Writing

Specify the filename to use in filename, Liquid interpolation is possible to change the name per event.

Use Liquid templating in data to specify which part of the received event should be written.

This agent only emits a ‘file pointer’, not the data inside the files, the following agents can consume the created events: CsvAgent, PostAgent, ReadFileAgent. Read more about the concept in the wiki.