As we saw in the previous Script Mode section, SageMaker includes a pre-built Scikit-Learn container. We usually recommend that the pre-built container be used for almost all cases requiring a Scikit_Learn algorithm. In this lab, however, we will build a custom Scikit-Learn container to demonstrate the steps involved in packaging custom code and libraries into a container that will be used with SageMaker.
In this lab we will use AWS CodeBuild to build our Docker image and push it to the Amazon Elastic Container Registry (Amazon ECR).
We will bring our custom container into the SageMaker by packaging it and uploading it into S3 (e.g.
codebuild__random_forest.zip). This compress files includes:
codebuild__random_forest.zip file into a S3 bucket (e.g.
Before we can run our CodeBuild job, we first need to make sure that we have an ECR repository in which we will store our Docker images.
Create repository, and enter
sagemaker-random-forestas the name of the repository. Take note of the full repository name as you will need it in the next section.
Now that we have uploaded our training script and Dockerfile to S3, the next step is to define a CodeBuild project. Remember, we'll be using a CodeBuild project to automate the Docker build process.
In a new tab, browse to AWS CodeBuild.
Create build project, and set the parameters:
Project Name: SM_BuildWorkshopContainer Source Source provider: Amazon S3 bucket: <BUCKET_NAME> S3 object key or S3 folder:<PREFIX>/codebuild__random_forest.zip Environment Operating system: Amazon Linux 2 Runtime(s): Standard Image: amazonlinux2-x86_64-standard:3.0 Privileged Enable this flag to build Docker images RoleName: codebuild-SM_BuildWorkshopContainer-service-role
Set the Environment Variables:
AWS_DEFAULT_REGION:us-east-1 ALGORITHM_NAME:sagemaker-random-forest IMAGE_NAME:<ACCOUNT_NO>.dkr.ecr.us-east-1.amazonaws.com/sagemaker-random-forest:latest
Before start building the Docker Image, you should give proper permission to AWS CodeBuild to push the Docker Image to the ECR repository.
AmazonEC2ContainerRegistryFullAccesspolicy, placing a checkmark next to each
You can click on
Start build to build the Docker Image and push it into the repository. Monitor your ongoing build job (try clicking the first link under ‘Build run’ to view the log files). When your container image has been created, the ‘Latest Build Status’ column will show as
Once the Codebuild job is complete, click the
ECRlink above. If your CodeBuild job was successful in pushing your new image into ECR, you will see a new image with the image tag
latest. Under the
Pushed atcolumn for this image, you should see that this image was pushed very recently.
Now, let's open our notebook and begin the training based on the custom container you built:
MLAI/BYOC/scikit_bring_your_ownfolder. Double click on
scikit_bring_your_own.ipynbto open it.