predict_pmml_batch() returns the predictions for multiple input records that are sent to Zementis Server. The values returned depend on the type of prediction model being executed on the server.

predict_pmml_batch(
  data,
  model_name,
  path = NULL,
  max_threads = NULL,
  max_records_per_thread = 5000,
  ...
)

Arguments

data

Either a data frame or a path to a file that contain multiple data records that are sent to Zementis Server for prediction. Files must be .csv or .json files. Alternatively, .csv and .json files can also be sent in compressed format (.zip or .gzip). For compressed files you need to set the path argument.

model_name

The name of the deployed PMML model that gets predictions on the new data records contained in data.

path

Path to a file to which the response from Zementis Server is written to. Only mandatory if compressed input files (.zip) are passed to data.

max_threads

Maximum number of concurrent threads to process the data that is sent. Default value is twice the number of processor cores.

max_records_per_thread

Maximum number of records processed by a single thread. Default value is 5000.

...

Additional arguments passed on to the underlying HTTP method. This might be necessary if you need to set some curl options explicitly via config.

Value

If data is a data frame, a .csv file or a .json file, a list with the following components:

  • model A length one character vector containing the model_name

  • outputs A data frame containing the prediction results for data

If data is a compressed file (.zip), a compressed .json file saved to path and an invisible 200 HTTP status code. If uncompressed and read into R, the file saved to path will be a list with the 2 components described above.

For regression models outputs will include a 1-column data frame with the predicted values.

For binary classification models outputs will include a 3-column data frame that includes the probability of class 0, the probability of class 1 and the classification class label result based on a 50% threshold.

Details

When calling predict_pmml_batch() data is sent to Zementis Server using octet streams. That means batch data is sent in stream mode and processing/scoring starts when the first chunk of streams hits the server. By default, the server will process records in a batch size of 5000 records per thread and there will be a maximum of 2*n threads to process the entire batch where n is the number of available cores on the machine.

Using the two function arguments max_threads and max_records_per_thread you can modify the compute resources on the server for your data processing needs. max_threads lets you reserve additional threads for your request (CPU resources). max_records_per_thread allows you to modify the number of records processed by a single thread (memory resources).

See also

Examples

if (FALSE) { # Predict the entire iris data set predict_pmml_batch(iris, "iris_model") # Predict the entire iris data set previously saved to a .json file jsonlite::write_json(iris, "iris.json") predict_pmml_batch("iris.json", "iris_model") # Predict the entire iris data set previously saved to a .csv file write.csv(iris, "iris.csv", row.names = FALSE) predict_pmml_batch("iris.csv","iris_model") # Predict the entire iris data set previously saved and compressed predict_pmml_batch("iris.csv.zip", "iris_model", "iris_predictions.zip") unzipped_predictions <- unzip("iris_predictions.zip") jsonlite::fromJSON(unzipped_predictions) }