Hacking SearX Docker, Nginx and Cloudflare together the wrong way

SearX is a great meta search engine that aggrgate multiple engine's result together, giving you privacy during searching.
A list of public instances can be found at https://searx.space/ - however it's not possible to know what logging those public instances are putting up. Some public instances are using Cloudflare, which is OK - but some tends to set the senstivity too high which ruins the experience. Note Cloudflare can see everything - but for personal users you do need that to stop bots.
A better solution is to create your own instance, and share with your friends. The sharing step is as important as setting up - otherwise it's effectly the same as you are using a single proxy. But think twice before setting up public instance unless you know what you are doing.
SearX has an official Docker Compose repo at https://github.com/searx/searx-docker - but I am already running Nginx on 443. So I need to hack the setup to make my current setup working with the new containers. Make sure you read https://github.com/searx/searx-docker#what-is-included- and understand which part is for what.
Grab this repo, edit .env file as instructed, and run ./start.sh once. Don't worry about issues: we will hack them though.

Hacking Caddyfile

I should not use Caddy with Nginx but to make it working:

  1. Remove all morty related content
  2. If you want to use Cloudflare, hack Content-Security-Policy and add https://ajax.cloudflare.com/ in script-src 'self'; otherwise rocket loader won't work.

Hacking searx/settings.yml

You need to change the Morty related stuffs at the end. Hardcode your Morty URL in, like https://search.fancy.tld/morty .

Hacking docker-compose.yml

  1. For Caddy, bind 80 to other ports. Like 4180:80.
  2. For morty: limit port 3000 to only localhost.
  3. For searx: hardcode morty related URL in.

Hacking .env

  1. Put localhost:4180 in host so Caddy won't take port 80 from Nginx.
  2. Use HTTP only. We shall do SSL with Nginx.

Hacking rules.json

Remove the block deflate part if you need Cloudflare.

Hacking Nginx

Try this setup:

upstream searx {
  server localhost:4180;
  keepalive 64;
}
upstream morty {
  server localhost:3000;
  keepalive 64;
}
server {
    listen       :80;
    listen       [::]:80;
    listen       :443 ssl;
    listen       [::]:443 ssl;
    server_name  fancy.search.tld;
    ssl_certificate /etc/nginx/ssl/fancy.search.tld.pem;
    ssl_certificate_key /etc/nginx/ssl/fancy.search.tld.key;
    ssl_session_timeout 5m;
    ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS;
    ssl_prefer_server_ciphers on;
    keepalive_timeout 70;
    ssl_session_cache shared:SSL:10m;
    ssl_dhparam /etc/nginx/ssl/dhparams.pem;
    location / {
        proxy_buffering off;
        proxy_http_version 1.1;
        proxy_set_header Connection "";  # Need this or morty will complain
        proxy_pass http://searx;
    }
    location /morty {
        proxy_buffering off;
        proxy_http_version 1.1;
        proxy_set_header Connection "";  # Need this or morty will complain
        proxy_pass http://morty;
    }
}

Note you must use upstream for reverse proxy or morty will complain.
With all the setup you should have something more or less usable. Wait for the checker to finish for optimized list of engines to enable - and note Qwant and DDG both uses Bing result, while Startpage is watered down Google.
If you want to set your SearX as default search engine for Chrome: visit your site, go to chrome://settings/searchEngines and your engine should be selectable. You may need to change the URL.

Customize Microsoft Sculpt Ergonomic Desktop on macOS

Recently I got 2 sets of Microsoft Sculpt Ergonomic Desktop Keyboard & Mouse Bundle (https://www.microsoft.com/accessories/en-ca/products/keyboards/sculpt-ergonomic-desktop/l5v-00002) for use in office and at home.
Quick review:

  • The wrist rest for the keyboard is soft: but easy to get dirty
  • Note F keys are button rather than ordinary keys. Do not purchase if you need F keys often.
  • Key is easy to type on
  • Buttons on the mouse are soft
  • Note the keypad is separate. If you need the keypad often, consider https://www.microsoft.com/en-ca/p/surface-ergonomic-keyboard/90pnc9ljwpx9?activetab=pivot%3aoverviewtab
  • Note if you purchase the mouse and the keyboard separately you will have 2 USB-A dongles: but the full set only requires 1 dongle if you get the set version. You cannot separate the set: and dongle is not reprogrammable.

Another version of this keyboard exists as https://www.microsoft.com/en-ca/p/surface-ergonomic-keyboard/90pnc9ljwpx9?activetab=pivot%3aoverviewtab
To get this keyboard & mouse working on macOS you will need the following list of software:

  • Karabiner at https://pqrs.org/osx/karabiner/
  • Mos at https://mos.caldis.me/
  • SensibleSideButtons at https://sensible-side-buttons.archagon.net/

all of them are open source.
Steps:

  1. Change the switch on the right corner of the keyboard to Fn
  2. Go to Karaviner-Elements.app, select the keyboard,
    • switch the left command and option keys
    • map right_gui to mouse5(Mouse buttons-button5)
    • Remap F7~F9 to match the keyboard symbols.
  3. Open Mos.app. Adjust the scrolling as needed. Maybe in\verse the scrolling.
  4. Open SensibleSideButtons.app. Enable it.

Result:

  • F keys will map the media keys
  • Command and Option keys match Mac keyboards
  • Scrolling is smoothed
  • Back key on the mouse is back; Windows key is forward(at least in Chrome)

Missing:

  • Calculator key: it's not showing up in keyboard events
  • Double-tap is missing since no mouse would support tapping except for Magic Mouse 2

V2Ray WebSocket+TLS+Web+Nginx+CDN

      3 Comments on V2Ray WebSocket+TLS+Web+Nginx+CDN

由于SS挂的厉害,网上的教程又语焉不详,这里记录一下。
原理:用Nginx(Caddy)解TLS,V2Ray处理里面的连接。
不建议初学者直接上手搞WebSocket+TLS+Web+Nginx+CDN:容易出错的地方太多。
请务必读完全文再操作,特别是注意部分。

Preparation

  • SSL: https://freessl.cn
  • 域名:https://www.freenom.com ,或者自己找一个

WebSocket

WebSocket可以直接连接,或者套CDN。

Server

{
  "inbounds": [
    {
      "port": 10001,  # 本地端口,不冲突即可
      "listen":"127.0.0.1", # V2Ray只监听本机
      "protocol": "vmess",
      "settings": {
        "clients": [
          {
            "id": "d111702d-8604-4358-b1fa-xxxxxxxxx",
            "alterId": 64
          }
        ]
      },
      "streamSettings": {
        "network": "ws",
        "wsSettings": {
        "path": "/ray/" # 注意最后的反斜杠必须和Nginx一致
        }
      }
    }
  ],
  "outbounds": [
    {
      "protocol": "freedom",
      "settings": {}
    }
  ]
}
server {
  server_name           {subdomain.domain.tld};
  listen 8.8.8.8 ssl;
  listen [2001::]:443 ssl; # 让Nginx也监听v6
  ssl on;
  ssl_certificate       /etc/nginx/ssl/xxx.pem; # SSL证书位置
  ssl_certificate_key   /etc/nginx/ssl/xxx.key;
  ssl_protocols         TLSv1 TLSv1.1 TLSv1.2;
  ssl_ciphers           HIGH:!aNULL:!MD5;
  location /ray/ {
        proxy_redirect off;
        proxy_pass http://127.0.0.1:10001; # 注意端口和上面一致
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        # Show realip in v2ray access.log
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host; # 必须有这条
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
     }
}

Client

{
  "log": {
    "error": "",
    "loglevel": "debug",
    "access": ""
  },
  "inbounds": [
    {
      "listen": "127.0.0.1",
      "protocol": "socks",
      "settings": {
        "ip": "",
        "userLevel": 0,
        "timeout": 0,
        "udp": false,
        "auth": "noauth"
      },
      "port": "8091" # SOCKS代理本地端口
    }
  ],
  "outbounds": [
    {
      "mux": {
        "enabled": false,
        "concurrency": 8
      },
      "protocol": "vmess",
      "streamSettings": {
        "wsSettings": {
          "path": "/ray/",# 注意和上面一致
          "headers": {
            "host": ""
          }
        },
        "tlsSettings": {
          "allowInsecure": true
        },
        "security": "tls",
        "network": "ws"
      },
      "tag": "",
      "settings": {
        "vnext": [
          {
            "address": "上面的domain",
            "users": [
              {
                "id": "和上面的UUID一样,
                "alterId": 64,
                "level": 0,
                "security": "auto"
              }
            ],
            "port": 443 # Nginx监听的端口
          }
        ]
      }
    }
  ],
  "dns": {
    "servers": [
      ""
    ]
  },
  "routing": {
    "strategy": "rules",
    "settings": {
      "domainStrategy": "IPIfNonMatch",
      "rules": [
        {
          "outboundTag": "direct",
          "type": "field",
          "ip": [
            "geoip:cn",
            "geoip:private"
          ],
          "domain": [
            "geosite:cn",
            "geosite:speedtest"
          ]
        }
      ]
    }
  },
  "transport": {}
}

注意

  • 如果挂Cloudflare:
    • 如果自签证书(真的没必要),SSL必须设flexible:否则CF会报证书错误
    • 如果使用正常SSL证书,SSL必须设Full:否则Nginx有可能随便丢一个站点过去
    • 如果是新域名:等SSL生成后才能操作
    • Cloudflare的免费证书只支持一级subdomain的SSL(*.domain.tld):如果域名是二级以上,请加钱或重新弄SSL。
  • 利用curl进行debug。在任何情况下,错误码不应该是404.

HTTP2 with Caddy

Cloudflare的免费WebSocket优先级不高:HTTP2有可能看网页更快。当然了,这个方案没有什么CDN能支持。

Server

https://域名:2053 {
    root /usr/share/nginx/html/
    tls /etc/nginx/ssl/公钥.pem /etc/nginx/ssl/私钥.key { # 也可以让Caddy自己找Letsencrypt生成
        ciphers ECDHE-ECDSA-WITH-CHACHA20-POLY1305 ECDHE-ECDSA-AES256-GCM-SHA384 ECDHE-ECDSA-AES256-CBC-SHA
        curves p384
        key_type p384
    }
    proxy /v2ray https://localhost:12000 { # 端口号是V2Ray监听的本地端口
        insecure_skip_verify
        transparent
        header_upstream X-Forwarded-Proto "https"
        header_upstream Host "域名
    }
    header / {
        Strict-Transport-Security "max-age=31536000;"
        X-XSS-Protection "1; mode=block"
        X-Content-Type-Options "nosniff"
        X-Frame-Options "DENY"
    }
}
{
  "inbounds": [
    {
      "port": 12000, # 监听本地端口号
      "listen": "127.0.0.1", # 只监听本地
      "protocol": "vmess",
      "settings": {
        "clients": [
          {
            "id": "UUID",
            "alterId": 64
          }
        ]
      },
      "streamSettings": {
        "network": "h2",
        "httpSettings": {
          "path": "/v2ray",
          "host": ["域名"]
        },
        "security": "tls",
        "tlsSettings": {
          "certificates": [
            {
              "certificateFile": "/etc/nginx/ssl/公钥.pem",
              "keyFile": "/etc/nginx/ssl/私钥.key"
            }
          ]
        }
      }
    }
  ],
  "outbounds": [
    {
      "protocol": "freedom",
      "settings": {}
    }
  ]
}

Client

{
  "inbounds": [
    {
      "port": 8091,
      "listen": "127.0.0.1",
      "protocol": "socks"
    }
  ],
  "outbounds": [
    {
      "protocol": "vmess",
      "settings": {
        "vnext": [
          {
            "address": "域名",
            "port": 2053, # Caddy监听的端口
            "users": [
              {
                "id": "同一个UUID",
                "alterId": 64
              }
            ]
          }
        ]
      },
      "streamSettings": {
        "network": "h2",
        "httpSettings": {
          "path": "/v2ray",
          "host": ["域名"]
        },
        "security": "tls"
      }
    }
  ]

Note

  • Nginx不能做HTTP 2转发:因为作者觉得没必要。只能用Caddy。
  • 如果要使用CDN:虽然很多CDN支持HTTP2(例如Cloudflare),但是我们需要的是回源走HTTP2。目前还没有找到这种东西。

小丸工具箱入门操作教程

      1 Comment on 小丸工具箱入门操作教程

本教程为小丸工具箱入门操作教程,仅对小丸工具箱的重要功能作详细解释。可能某些功能通过小丸的后续更新变得有些不同,所以仅供学习参考。
本次以236版来讲解

选项页1:视频


这里是压制视频的地方
通常来说,你把视频添加到“单视频压制操作范围”里面,在“通用参数”里调试你所要的参数,点击压制即可。
批量压制同理,不过它点的是右下角的压制。
普通的压制只需要调一调CRF的数值和视频的分辨率即可,不需要搞其他更深奥的东西。
需要注意的是,小丸会自动识别跟视频在同一个文件夹的同名字幕文件,在“单视频压制操作范围”增加视频后,它会自动添加同文件名的字幕文件上去。
而在批量压制的框框里,视频文件名会变成蓝色,跟上图的效果一样,再勾上右边的“内嵌字幕”,就可以压进去了。如果字的颜色是黑的,那么就说明没关联到字幕。

视频选项详细解说

编码器

无论是32位还是64位,都无所谓,跟着系统走就行了。(压制4K或以上最好用64位编码器,否则容易出现爆内存)
而“X264_GCC”,是用GCC编译的X264版本。
至于怎么选择都没必要纠结,一般压制是没区别的。

音频模式:

小丸这里分为3个模式:复制、压制、无音频流。
复制模式:顾名思义就是直接复制过来,使用这个模式之前,你需要考虑原视频的音频是否能封装进你所压制好的容器里。
压制模式:该选项的设置在“音频”选项页中,设置是通用的。
无音频流 :那就是没有音频了。

分离器:

这里一般选项为auto,没啥事不要动它。
压制视频出来画音不同步,有很大一部分都是分离器出问题,要是出现这个问题,请提供压制日志到论坛或交流群内报错。

起始帧和编码帧数:

该选项默认为0,如果你只需要压制该视频的其中一小段,那么你只要在“起始帧”设置从第几帧开始压制,在“编码帧数”上设置要压多少帧即可。
需要注意的是,使用该功能一定要“无音频”压制

压制模式

CRF:

该参数的质量范围一般为1-51,一般设置21-25之间就可以,此值越大码率范围越低。21可以压制出高码率。网络播放则设为24即可。
质量是x264参数中的 CRF(Constant Rate Factor),这种码率控制方式是非常优秀的,以至于可以无需2pass压制,即使1pass也能实现非常好的码率分配利用。很多人在压片的时候不清楚应该给视频压到多少码率才比较好。CRF就是按需要来分配码率的。

2Pass:

一般是给控制体积和码率用的,压出来的画质好与坏,得看你定下了多少码率和你的片需要多少码率。
2pass正是有“压制需要两遍,浪费时间”、“出问题的概率多”、“压制出来的效果不太好”等缺点,我们一般都不推荐大家使用。

自定义:

就是给你写自己的参数用的,会直接覆盖掉小丸所有内置的X264参数。但不包括编码器、音频模式、起始帧和编码帧数这4个选项。
使用之前,请确定你对X264的参数知识有一定了解,不要无脑复制别人的压制参数就直接压。

选项页2:音频


无论是音频还是视频,都可以从这里处理成一个单独的音频文件
只要选择不同的编码器即可。
需要注意的是,这里的设置跟视频页的“压制音频”通用,单独压其他音频后,记得改回AAC编码器,否则会造成视频压制失败。
音频合并功能有点小问题,凑合能用,但不推荐用。

选项页3:常用


此选项页有3大功能
1. 一图流
2. 视频无损截取
3. 视频方向旋转

一图流

音频码率

压制音频到所填写的码率,如果不需要压制请勾选复制音频即可。
在勾选了“复制音频流”的情况下,请不要直接使用AAC格式音频,得把它封装成M4A或MP4格式才能正常运行。

FPS

最好填上23或者30比较常用的帧数,如果直接用1的话,调戏进度条会比较困难。

CRF

跟压制的CRF同理,默认就好。

时间

把音频拖进去后,小丸会自动识别该音频有多少秒,如识别有误,可手动更改。

其他

起始时刻和结束时刻

起始时刻和结束时刻:时间格式为 时:分:秒,设定时间只需要结束时刻大于起始时刻点击“截取”即可。
截取之后总时间可能会有点误差,属于正常现象,不影响使用。
根据ffmpeg的无损截取原理,是无法做到准确到秒来截取的,只能准确到该时间的关键帧来截取。
需要准确到秒,请使用PR或爱剪辑等编辑软件或者通过ffmpeg重编码的方式来准确截取。

Transpose

你要把画面怎样旋转,就选哪个,选好点击“旋转”即可。
该设置也得重编码。

选项卡4:封装

合并成MP4

可以把单独的H264和AAC文件封装到一起

FPS和PAR

万年不用该的选项,可以不理。

替换音频

可以把视频里的音频替换成添加进来音频

合并成MKV

这里仅提供一些基础的MKV封装功能,如需要用其他复杂功能,建议你使用MKVExtractGUI等软件

批量封装

视频只要是AVC+AAC格式的,都可在这里批量封装成MP4/MKV/FLV/AVI等格式。
需要注意的是,音频不是AAC格式的都会被转成AAC格式。

选项卡5: AVS

我就不多介绍了,会用的基本都懂,不懂的教起来也复杂,还是找个AVS教程看看吧。

选项卡6:MediaInfo

用来查看视频信息,报错必备。

选项卡7:设置

界面语言

支持简、繁、英、日 4种。

托盘模式

该功能仅仅是压制时托盘有个小丸图标,在图标中悬停会有简要的压制进度信息,并且压制完成后会弹出提示。

X264优先级

这个选项可以设置程序的优先级,不会大幅度提升或降低压制速度,顶多是能让你一边凑合着玩游戏一边压制。

X264线程

一般来说小丸的X264只能利用到16线程,所以让它auto就可以。如果需要限制线程,那就根据自己最大的框框数量,减去需要保留的框框数量,得出来的和就是X264的线程。

X264自定义命令行

这个自定义,它不会把小丸的界面参数全部覆盖掉,只会覆盖掉内置参数。
即保留--crf 24.0 --threads 16这两项参数。

预览播放器

可以在这里指定AVS的预览播放器。

退出程序时删除所有临时文件

该功能开启后,在退出软件时,../MarukoToolbox/tools文件夹里的批处理等临时文件会自动删除。

启动X265

勾上了就可以用X265了,该功能还在测试中,所以默认不开启。

还原默认设置

小丸被玩坏了可以点这里。゚∀゚)σ

查看日志

点击即可查看最新的日志文件,如果要查找以往的日志文件,可以去 ../ MarukoToolbox/logs 文件夹里查找。

删除日志

点击后整个logs文件夹被删除。

From anywhere to AWS Lambda in one line with Zappa

The problem

We always want to do continus integration and deployment with our repo. Bitbucket comes with handy build function.
Version releasing with Zappa is easy: zappa update xxx will make a release, and zappa rollback xxx -n 3 would revert the changes.
But Zappa is currently broken on Python 3.7 as Zappa is using async as package name, while Python 3.7 shall use async and await as reserved names.
Locally I use Python 3.7 with macOS, but I have to support Windows + macOS + Ubuntu + CentOS: how can I quickly make release everywhere?

Solution

Local

Refer to https://blog.zappa.io/posts/docker-zappa-and-python3.
LambCI has made a couple of Docker images that would simulate AWS Lambda, located at https://github.com/lambci/docker-lambda , which provides handy shell access.

With CI

With some hacking we can make a Docker image for release, as in https://blog.zappa.io/posts/simplified-aws-lambda-deployments-with-docker-and-zappa . But this image only supports Python 2.7.
A Python 3.6 version is located at https://cloud.docker.com/repository/docker/cnbeining/zappa3 . And we can have a one-liner:
docker run -e AWS_SECRET_ACCESS_KEY=xxxxxxxxx -e AWS_ACCESS_KEY_ID=AKXXXXXXXXXXX -e AWS_DEFAULT_REGION=us-west-2 -v $(pwd):/var/task --rm cnbeining/zappa3 bash -c "virtualenv -p python3 docker_env && source docker_env/bin/activate && pip install -r requirements.txt && zappa update && rm -rf docker_env"
This command will create a environment, attach your current folder, install all the requirements, update the version, and remove all the garbage.
One note: DO NOT SET profile in zappa_settings.json. This image will automatically login with your key.

Reference:

https://blog.zappa.io/posts/continuous-zappa-deployments-with-travis

Flask from Docker to Lambda with Zappa: the more-or-less complete guide

TLDR

Step-by-step guide of how FleetOps migrate the Docker-based Flask API to AWS Lambda.

History

At FleetOps.ai we use Docker extensively when building APIs. Our API is built on Flask, with microservices supporting async features.
Since we are moving microservices to AWS Lambda ... What if the main API could also run on Lambda?

AWS Lambda & Serverless

Serverless is probably the hottest word in the DevOps world in 2018. Does not sound very interesting?
Compared to SaaS(Google App Engine, Heroku, Openshift V2, Sina App Engine, etc.): serverless does not have severe vendor lock-in problem. Most of the time you do not need to edit ANYTHING to migrate to serverless. You CAN choose to write the code in a SaaS way: and if you don't fancy that a DIY approach is still available. In this case I did not make any change to the original codebase!
Compared to Docker: although Docker is more flexible and you have access to a full Linux OS within the VM, it's still hard to manage when scaling. Kubernetes is good: but the burden for DevOps is dramatic. At FleetOps we do not want to put so much energy into DevOps: not to say hobby project.
Compared to Web Hosting: serverless supports more languages(Java, Node, etc.) which are not possible to get in the Hosting world.

Problem/limits with AWS Lambda

To name a few:

  • Does not support ALL the languages like Docker, and definitely not ALL the versions of Python. AWS is working on super lightweight OS image so maybe we can see something different?
  • Have to bring your binary/library should you want to use any special software, and they have to be statically linked, while with Docker you can do anything you want. Well, does not sound very bad, but:
  • The size limit of code: if you love 3rd party library it may be very hard to put everything into one zipball. Well technically you can grab them on the fly upon function invoked from S3, BUT:
  • Cold start problem: you have absolutely no control the life cycle of those function. God bless you if your function needs 10s to start.
  • Hard max runtime: 900s is the limit. Maybe you can get it raised but YMMV.
  • Stateless: Like container committing suicide after every invoke.
  • No access to special hardware, like GPU.
  • No debugger: do some print()s instead.
  • Confusing networking: I will try to sort out this issue in this article.

So if your task is:

  • not require any special technology, and uses the most common stack
  • stateless, or is able to recover state from other services(which should be the standard for every API - at least in FleetOps we ensure that every API call shall be stateless)
  • one task does not run forever and does not consume lots of memory
  • not really benefiting from JIT or similar caching
  • not super huge
  • not using fancy hardware
  • having an uneven workload

Then you could benefit from AWS Lambda.

The Guide

1. Get ready

We use Python 3.6 for the API for now.
Get a requirement.txt ready. Not there yet? pip freeze > requirements.txt.
On your dev machine, make a virtual environment: (ref: https://docs.python-guide.org/dev/virtualenvs/)

pip install virtualenv
virtualenv venv
source venv/bin/activate

Install Zappa(https://github.com/Miserlou/Zappa ): pip install zappa
Get your AWS CLI ready: pip install boto3 and refer to steps in https://pypi.org/project/boto3/ . Make sure that account has full access to S3, Lambda, SQS, API Gateway, and the whole network stack.

2. Some observations and calculations:

  • Where is your main function? Make a note of that.
  • How much memory do you need? If you cannot provide a definite number yet, let it here.
  • What is your target VPC & security group? Note their IDs.
  • What 3rd party binary do you need? Compile them with statically linked library - you cannot easily call apt-get on the remote machine!
  • Do you need any environment variables? There are different ways of setting them, and I am using the easiest approach - putting them in the config JSON.

3. Get the Internet right!

Further reading: https://gist.github.com/reggi/dc5f2620b7b4f515e68e46255ac042a7
Quote from @reggi 's article:

So it might be really unintuitive at first but lambda functions have three states.
1. No VPC, where it can talk openly to the web, but can't talk to any of your AWS services.
2. VPC, the default setting where the lambda function can talk to your AWS services but can't talk to the web.
3. VPC with NAT, The best of both worlds, AWS services and web.

Use 1. if you do not need this function to access any AWS service, or you only need the function to access them via the Internet. Use 2. if you are building a private API. And for FleetOps, we are going down path 3.
Note that not all the AWS services are accessible by VPC: e.g., S3 and RDS are accessible by VPC, while SQS and DynamoDB would require Internet access, even you are calling from within Lambda.
My recommended step is:

  1. Create Internet Gateway.

  1. Create 4 subnets.

  1. Create NAT Gateway.

  1. Create Route table.




Take note of the 3 private-faced subnet ids.
We will use Zappa to configure the networking. Note if you want to deploy the function to multiple AZ, you may need to do the steps multiple times, once at each AZ.

4. Wrap it up

Get back to your virtual env, and active it.
Do a zappa init. You will be asked the following questions:

Your Zappa configuration can support multiple production stages, like 'dev', 'staging', and 'production'.
What do you want to call this environment (default 'dev'):

Use whatever name: and you can carry on the stage's configuration for further stages.

Your Zappa deployments will need to be uploaded to a private S3 bucket.
If you don't have a bucket yet, we'll create one for you too.
What do you want to call your bucket? (default 'zappa-xxxxxxxxxx'):

By default, Zappa will only use this bucket when uploading/updating the function.

It looks like this is a Flask application.
What's the modular path to your app's function?
This will likely be something like 'your_module.app'.
We discovered: jinjaTemplates.app_template.app, v2.app.app
Where is your app's function? (default 'jinjaTemplates.app_template.app'):

Put in the entrance function.

You can optionally deploy to all available regions in order to provide fast global service.
If you are using Zappa for the first time, you probably don't want to do this!
Would you like to deploy this application globally? (default 'n') [y/n/(p)rimary]: n

Depends on your use case.
Now you may want to edit the zappa_settings.json: all the arguments are at https://github.com/Miserlou/Zappa#advanced-settings but this is the basic one that get our API running:

{
    "dev": {
        "app_function": "v2.run.app", // entrance function
        "profile_name": null, // boto3 profile
        "project_name": "FleetOpsAPI", // a name
        "runtime": "python3.6",  // Refer to AWS for list. Zappa only supports Python 2.7 and 3.6 for now.
        "s3_bucket": "zappa-xxxxxx",  // code temp bucket
        "memory_size": 256,  // Memory. You will pay for per second memory use so choose wisely!
        "environment_variables": {  // Everything that used to live in export
            "ENV": "dev",
           ..........
        },
        "vpc_config": {
            "SubnetIds": ["subnet-xxxxxxxx"],  // Put down your subnet IDs. We use all 3 zones within the same AZ and I recommend you do the same.
            "SecurityGroupIds": ["sg-xxxxxx"]  // Security group for access of other AWS service.
        },
    }
}

There are TONS of settings Zappa provides but I am not using all them: You can use a selective set of feature to make sure you do not have vendor lock-in. For example, Lambda can handle URL routing by itself but I am not using it to avoid any kind of lock-in. By doing so you can easily take the code and put them back on the container if you wish.
Zappa does provide some exciting feature:

  • Setting AWS Environment variables: If you prefer to put the secret key in another place
  • Auto packing huge project: if your project is >50M, Zappa will handle that.
  • Keep warm: Use CloudWatch to make sure there is one function running.

Save zappa_settings.json.

5. PROFIT!

Do a pip install -r requirement.txt to install all the packages.
Now do a zappa deploy.
You would see:

Downloading and installing dependencies..
 - pymongo==3.7.2: Using locally cached manylinux wheel
 - pycrypto==2.6.1: Using precompiled lambda package
 - protobuf==3.6.1: Using locally cached manylinux wheel
 - msgpack==0.6.0: Using locally cached manylinux wheel
 - markupsafe==1.1.0: Using locally cached manylinux wheel
 - greenlet==0.4.15: Using locally cached manylinux wheel
 - gevent==1.3.7: Using locally cached manylinux wheel
 - sqlite==python36: Using precompiled lambda package
Packaging project as zip.
Uploading xxxxxx-dev-1546239781.zip (22.9MiB)..
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24.0M/24.0M [00:00<00:00, 77.6MB/s]
Updating Lambda function code..
Updating Lambda function configuration..
Uploading xxxxxxx.json (1.6KiB)..
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63K/1.63K [00:00<00:00, 84.7KB/s]
Deploying API Gateway..
Scheduling..
Unscheduled xxxxxx-dev-zappa-keep-warm-handler.keep_warm_callback.
Scheduled xxxxx-dev-zappa-keep-warm-handler.keep_warm_callback with expression rate(4 minutes)!
Your updated Zappa deployment is live!: https://xxxxxx.execute-api.us-west-2.amazonaws.com/dev

And you now have a serverless API ready to serve!

6. Clean up

You want to do the following tasks to save $$$, boost performance and secure the setup:

  • View some CloudWatch log and set the memory to a reasonable value afterwards.
  • Adjust warmer period.
  • Adjust API Gateway caching.
  • Setup cronjobs if you have them: either with Zappa or with CloudWatch.
  • Change the scope of IAM user for Zappa: the default one is super powerful.
  • Adjust X-Ray if you need it.

Conclusion

Migrating Flask API to serverless could be painless. I did not adjust one single line of code: and there is no vendor lock-in as every step can be reproduced by Dockerfile.
Good luck with your journey with serverless!

备忘:SOCKS和SOCKS/HTTP代理的连接

      1 Comment on 备忘:SOCKS和SOCKS/HTTP代理的连接

Use case:

  • 直接SOCKS肯定过不去GFW
  • 公有的SS看Google会跳验证码或者IP黑掉
  • Chrome的SOCKS代理不支持密码验证

方法:

  1. 本地安装Proxifier。配置好SS 不要使用全局模式,只监听某个端口。准备好可以使用的SOCKS/HTTP代理。
  2. 在Proxifier中分别配置好两个代理。
  3. 如图设置:

  1. 在Proxifier中,对浏览器使用这个chain,如图所示(例子是curl):

  1. 现在你的浏览器走SS翻墙 但是IP换了。

$ curl "http://ip-api.com/json"
{"as":"AS32489 Amanah Tech Inc.","city":"Toronto","country":"Canada","countryCode":"CA","isp":"Amanah Tech","lat":43.6683,"lon":-79.4205,"org":"Amanah Tech","query":"184.75..xxx","region":"ON","regionName":"Ontario","status":"success","timezone":"America/Toronto","zip":"M6G"}

数据说话:出国旅游,Visa还是MasterCard?

进入2018年,免货转(no-FX)信用卡突然大行其道。毕竟消费降级,大家囊中羞涩,传统信用卡2.5%的货币转换费简直等于给信用卡公司上税。
Brim Financial的Brim (WE)MC靠免货转的旗号,靠PPT收割了十几万个人信息然而发卡还是遥遥无期;Rogers推出了屌丝三宝之一的Rogers WEMC,外币返现4%;Scotiabank的Passport Visa Infinite正式成为丰业银行旗下的主打旅游卡产品;Home Bank的Preferred Visa仗着免年费可刷美国Costco成为了不错的抽屉卡;Prepaid卡的选择也有很多。
常逛小黄网的观众应该都知道,Visa和MasterCard肯定不会好心到按中间价进行兑换。那么到底高多少?交易群普遍认为是0.5%。有这么多吗?
这个问题greedyrates有过研究(https://www.greedyrates.ca/blog/mastercard-or-visa-foreign-purchases-better-canadians/) :但是数据量比较少,只对每周进行了取样,而汇率这个东西是瞬息万变的。这次的研究希望可以解决之前研究的缺陷。
研究方法:
数据源:Visa,MasterCard(下称MC),和中间价数据。文中所有单位都为基点(百分比)。数据的日期范围是报告日(2018年9月8日)前364天(2017年9月10日)至报告日,因为MasterCard只提供一年的历史数据。

  • Visa的汇率来自https://usa.visa.com/support/consumer/travel-support/exchange-rate-calculator.html 。其中有9天没有数据:使用后一天的数据填充。
  • MasterCard的汇率来自https://www.mastercard.us/en-us/consumers/get-support/convert-currency.html 。
  • 中间价数据来自https://openexchangerates.org/ 。数据是当日closing价格的中间价。在处理数据中,可能有小于千万分之一的误差。

所有的数据由爬虫得到。每种数据抽样5次检查爬虫工作情况。爬虫源代码公开。
结论:
先上一张全家福:

眼花缭乱?我们一点点分析:

上图是按时间排列,Visa卡比MasterCard汇率高出的基点数。可以看出,总体而言,Visa的汇率要高于MasterCard。
计算得出,Visa平均比MasterCard高0.209个基点,然而标准差是0.438,意味着差异统计上不显著:因为在去年中,MasterCard只比Visa优秀271天。
下图更加清楚:

大部分情况下,MasterCard的汇率都会比Visa好那么一点的。

Visa比中间价高那么0.449个基点:标准差是0.611。所以单单是免货转的Visa卡是不够的:在极端条件下,1%的返现会被吃光。

比起Visa,MasterCard就没那么心黑:多收0.240个基点,标准差0.454。大部分的免货转MC都不用担心赔钱了:Rogers WEMC即使在最惨的情况会剩下个0.8%的。
结论:

  1. Home Trust的Visa慎用,有可能赔钱;Scotiabank Passport VI可用。
  2. 大部分的MasterCard都不会赔钱。
  3. 无脑刷MC吧,除非不让。

本次研究没能解决的问题:

  1. Visa和MC都有连续几天汇率不变的情况,然而国际汇市是不可能不波动的。Visa的问题更加明显。不知道是Visa的系统抽风还是Visa的交易员比较懒。
  2. AMEX的数据实在没有找到:希望有这部分数据的观众进行补充。
  3. 由于XE的数据太贵了,这次研究使用了openexchangerates的数据作为中间价,有可能精度不如XE:但是应该不会有颠覆性影响。

附:

  1. 原始数据:https://docs.google.com/spreadsheets/d/1uwTFxSuQsJxey_KP3pMO1h2xZmj0buSO5YSP3tzFMMU/edit#gid=0
  2. 爬虫代码:
#!/usr/bin/env python
#coding:utf-8
# Author:  Beining --<i at cnbeining.com>
# Purpose: Research: Visa vs MC
# Created: 09/08/2018
import requests
import lxml
import re
from multiprocessing.dummy import Pool as ThreadPool
#----------------------------------------------------------------------
def get_visa_usd_cad_exchange_rate_by_date(date_string):
    """"""
    url = "https://usa.visa.com/support/consumer/travel-support/exchange-rate-calculator.html"
    params = (
        ('amount', '100'),
        ('fee', '0.0'),
        ('exchangedate', date_string),
        ('fromCurr', 'CAD'),
        ('toCurr', 'USD'),
        ('submitButton', 'Calculate exchange rate'),
    )
    headers = {
        'authority': "usa.visa.com",
        'pragma': "no-cache",
        'Cache-Control': "no-cache",
        'upgrade-insecure-requests': "1",
        'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36",
        'dnt': "1",
        'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        'accept-encoding': "gzip, deflate, br",
        'accept-language': "en-CA,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,en-GB;q=0.6,en-US;q=0.5",
        }
    response = requests.get(url, headers=headers, params=params)
    if response.ok and 'converted-amount-value' in response.text:
        price_find = re.search( r'<strong class="converted-amount-value"> (.+) Canadian Dollar', response.text)
        if price_find:
            price_find = price_find.groups()
        else:
            return (date_string, None)
        if len(price_find) > 0:
            return (date_string, float(price_find[0]))
    return (date_string, None)
#----------------------------------------------------------------------
def get_mc_usd_cad_rate_by_date(date_str):
    """"""
    url = "https://www.mastercard.us/settlement/currencyrate/fxDate={date_str};transCurr=USD;crdhldBillCurr=CAD;bankFee=0;transAmt=100/conversion-rate".format(date_str = date_str)
    headers = {
        'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36",
        'referer': "https://www.mastercard.us/en-us/consumers/get-support/convert-currency.html",
        'Cache-Control': "no-cache",
        }
    response = requests.get(url, headers=headers)
    if response.ok:
        return float(response.json()['data']['crdhldBillAmt'])
    return None
#----------------------------------------------------------------------
def get_middle_usd_cad_rate_by_date(date_str):
    """"""
    url = "https://openexchangerates.org/api/historical/{date_str}.json?app_id=11ff5e6d97d74131abe05942bae6796e&base=usd&symbols=cad".format(date_str = date_str)
    headers = {
        'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36",
        'Cache-Control': "no-cache",
        }
    response = requests.get(url, headers=headers)
    if response.ok:
        return float(response.json()['rates']['CAD'])
    return None
#----------------------------------------------------------------------
def execute_multiprocess(func, iterable, thread_num = 8):
    pool = ThreadPool(thread_num)
    result = pool.map(func, iterable)
    pool.close()
    pool.join()
    return result
mc_date_list = [(datetime.date.today() - datetime.timedelta(days = x)).strftime('%Y-%m-%d') for x in range(0, 364)]
visa_date_list = [(datetime.date.today() - datetime.timedelta(days = x)).strftime('%m/%d/%Y') for x in range(0, 364)]
mc_result = execute_multiprocess(get_mc_usd_cad_rate_by_date, mc_date_list, thread_num = 8)
visa_result = execute_multiprocess(get_visa_usd_cad_exchange_rate_by_date, visa_date_list, thread_num = 16)
middle_result = execute_multiprocess(get_middle_usd_cad_rate_by_date, mc_date_list, thread_num = 8)
visa_result.count(None)  # 9