zementisr is an R client for the Zementis Server API. Zementis Server is an execution engine for PMML models which also comes with model management capabilities.
In this quickstart guide we will show how you can use zementisr to deploy PMML models to Zementis Server, predict new values by sending data to the server and manage the entire PMML model life cycle without leaving your preferred R development environment.
Zementis Server’s REST API uses HTTP Basic Authentication. For each request the client needs to provide username and password.
Since typing your password in the console is a bit too dangerous (you might accidentally share the .Rhistory
file) and asking each time gets too cumbersome quickly, the zementisr package requires that you store your secrets and the base URL of your Zementis Server as environment variables in the .Renviron
file in your home directory.
Please make sure to set the environment variables below in your .Renviron
file before using functions from the zementisr package. You can easily edit .Renviron
using usethis::edit_r_environ()
.
Before we get started using the zementisr package, we will create two simple prediction models and convert them to PMML using pmml()
from the pmml package. The first PMML model will be saved to disk:
library(rpart)
library(pmml)
iris_lm <- lm(Sepal.Length ~ ., data=iris)
iris_pmml <- pmml(iris_lm, model_name = "iris_model")
saveXML(iris_pmml, "iris_pmml.xml")
#> [1] "iris_pmml.xml"
kyphosis_fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
kyphosis_pmml <- pmml(kyphosis_fit, model_name = "kyphosis_model")
Now, we will start using functions from the zementisr package. We will begin with upload_model()
to upload our PMML models to the server.
upload_model()
either accepts a path to a PMML file on disk or an XMLNode
object created with pmml::pmml()
. Below we will demonstrate both options to upload the two models to the server. A successful upload always returns a list with the model name and its activation status.
After deployment you might be interested in how many PMML models are currently deployed to Zementis Server:
Use get_model_properties()
to get the PMML model’s name, description, input and output field properties:
get_model_properties("kyphosis_model")
#> $modelName
#> [1] "kyphosis_model"
#>
#> $description
#> [1] "RPart Decision Tree Model"
#>
#> $creationDate
#> [1] "2020-01-07 20:52:18"
#>
#> $isActive
#> [1] TRUE
#>
#> $inputFields
#> name type usage
#> 1 Age DOUBLE ACTIVE
#> 2 Number DOUBLE ACTIVE
#> 3 Start DOUBLE ACTIVE
#>
#> $outputFields
#> name type usage
#> 1 Predicted_Kyphosis STRING OUTPUT
#> 2 Probability_absent DOUBLE OUTPUT
#> 3 Probability_present DOUBLE OUTPUT
If you like to deactivate a PMML model without removing it from the server, do the following:
deactivate_model("iris_model")
#> $model_name
#> [1] "iris_model"
#>
#> $is_active
#> [1] FALSE
deactivate_model("kyphosis_model")
#> $model_name
#> [1] "kyphosis_model"
#>
#> $is_active
#> [1] FALSE
You even can add some magrittr
and purrr
flavor to chain several zementisr functions together. For instance, the following line of code lets you activate all your PMML models at once:
If you like to predict a single new input record, use predict_pmml()
which needs a one row data frame as its data input and the name of the deployed PMML model that should get the prediction. If executed successfully, predict_pmml()
returns a list with the following components:
model
A length one character vector containing the name of the PMML model that was executed on the server
outputs
A data frame containing the prediction results. The values returned depend on the type of prediction model being executed on the server. You can spot the difference between a regression and a classification model in the output below
predict_pmml(iris[42, ], "iris_model")
#> $model
#> [1] "iris_model"
#>
#> $outputs
#> Predicted_Sepal.Length
#> 1 4.295281
predict_pmml(kyphosis[23, ], "kyphosis_model")
#> $model
#> [1] "kyphosis_model"
#>
#> $outputs
#> Probability_present Probability_absent Predicted_Kyphosis
#> 1 0.5714286 0.4285714 present
If you like to predict multiple new input records all at once, use predict_pmml_batch()
which accepts data frames, .csv
and .json
files as data input. .csv
and .json
files can even be sent in compressed format (.zip
or .gzip
).
predict_pmml_batch(iris[23:25, ], "iris_model")
#> $model
#> [1] "iris_model"
#>
#> $outputs
#> Predicted_Sepal.Length
#> 1 4.722679
#> 2 5.059837
#> 3 5.369821
jsonlite::write_json(iris[23:25, ], "iris.json")
predict_pmml_batch("iris.json", "iris_model")
#> $model
#> [1] "iris_model"
#>
#> $outputs
#> Predicted_Sepal.Length
#> 1 4.722679
#> 2 5.059837
#> 3 5.369821
write.csv(iris[23:25, ], "iris.csv", row.names = FALSE)
predict_pmml_batch("iris.csv","iris_model")
#> $model
#> [1] "iris_model"
#>
#> $outputs
#> Predicted_Sepal.Length
#> 1 4.722679
#> 2 5.059837
#> 3 5.369821
As you can see by the output above, predict_pmml_batch()
also returns a list with the two components model
and outputs
.
download_model()
lets you download the PMML source of a deployed model. You might choose to download the PMML model source before deleting the model permanently from the server with delete_model()
which is described in the next section. download_model()
returns a list with two components:
model_name
of the downloaded model including the suffix “.pmml”model_source
represented as an S3 object of class XMLInternalDocument
created by parsing the server response using XML::xmlParse()
After downloading the model of your choice, you can use XML::saveXML()
to store it on disk:
iris_download <- download_model("iris_model")
XML::saveXML(iris_download[["model_source"]], file = iris_download[["model_name"]])
Again using some tidyverse
ingredients, you can easily download all deployed models at once and store them in a data frame:
downloads <- get_models() %>% purrr::map(download_model)
tibble::tibble(
model_name = purrr::map_chr(downloads, "model_name"),
source = purrr::map(downloads, "model_source"))
#> # A tibble: 2 x 2
#> model_name source
#> <chr> <list>
#> 1 iris_model.pmml <XMLIntrD>
#> 2 kyphosis_model.pmml <XMLIntrD>
If you like to store the downloaded models on disk instead, do this:
After a PMML model has reached the end of its life cycle you might want to remove it from the server using delete_model
() which always returns a character vector with the names of the models still residing deployed to the server:
...
argumentEach function from the zementisr package comes with a ...
(dot-dot-dot) argument. It is used to pass on additional arguments to the underlying HTTP method from the httr
package. This might be necessary if you need to set some curl options explicitly via httr::config()
.