In the last section we used a real-time inference endpoint. Real-time endpoints are useful when we need predictions as part of a low-latency system. For example, we might want to provide a customer churn prediction while a call center agent is still talking to the customer on the phone. That way, we can provide help to the agent if the customer is at risk of churn, such as providing a sales incentive over the phone.
However, in other cases, we don't need predictions in real-time. With recommendation systems, for example, we might want to generate a very large number of predictions and store them in a NoSQL database for use later on. Or, we may want to generate a sales forecast for the next seven days. In these cases, the immediate context doesn't affect the prediction, and we can use a SageMaker batch transformation job to get a large number of predictions at once. Using a batch transformation is often more cost-effective than keeping a prediction endpoint on all the time.
Download the template file for an endpoint. Make the following changes:
Models
.Run:
kubectl apply -f tf-batch.yaml
You can check on job status by running:
kubectl get batchtransformjob
You can see more details about a specific job with:
kubectl describe batchtransformjob <JOBNAME>
Finally, you can see detailed job logs with:
kubectl smlogs batchtransformjob <JOBNAME>
Download the example notebook. In your notebook, click Upload
and select this notebook. Click Upload
again to complete the upload.
Now click the imported notebook to open it. Read through and execute each cell in the notebook, making sure to edit the variables in the first code cell.
After you finish executing this notebook, you'll see metrics like precision and recall for our model. Note that the results are the same as when we got predictions from the real-time endpoint in the last chapter, but now we only had to make one call to SageMaker, rather than using the real-time endpoint.