Current deployment is using the following technologies
- Package models and dependencies in a single container
- Distribute docker images in Docker Hub
- AWS Lambda
- Handle GitHub web hooks
- AWS API Gateway
- Expose lambda handler as API
- AWS Batch
- AWS S3
- AWS Lambda
- GitHub API / webhooks
- Extract data to build datasets
- Consume webhook events to trigger the automation
Even though most of the services are deployed in the cloud, all the primitives can be self hosted. The idea is that a webhook from GitHub triggers the automation and a simple HTTP API handles the request and spawns a task.
Infrastructure as Code¶
All the infrastructure is managed as code and the codebase lives under mozilla/webcompat-ml-deploy.
To avoid over-complicating things, terraform is maintained in the git repository encrypted using git-crypt.
For each change maintainers should make sure that the state is also checked in the repository. The state also leaks credentials so its important to always make sure that the state is encrypted before pushing.
All ML tasks should be described as a
docker/ and should have the ML model prebundled.
Regular maintenance tasks¶
needsdiagnosis model dataset
$ webcompat-ml-needsdiagnosis build-dataset --es-url "<URL>" --es-index-name="<INDEX>" --es-doc-type="<TYPE>" --output "</path/to/dataset.csv>"
$ webcompat-ml-needsdiagnosis train --data "</path/to/dataset.csv>" --output "</path/to/model.bin>"
Releasing a new
needsdiagnosis task image
$ cd webcompat-ml-deploy/docker/needsdiagnosis $ docker build . -t ml-task:needsdiagnosis --build-arg MODEL_PATH="</path/to/model.bin>" $ docker tag ml-task:needsdiagnosis mozillawebcompat/ml-task:needsdiagnosis $ docker push mozillawebcompat/ml-task:needsdiagnosis
Applying a terraform change
$ git-crypt unlock $ terraform plan $ terraform apply $ git add . $ git add terraform.tfstate $ git add terraform.tfstate.backup $ git commit -m '<change applied>'