R/predict_pmml_batch.R
predict_pmml_batch.Rdpredict_pmml_batch() returns the predictions for multiple input records
that are sent to Zementis Server. The values returned depend on the type of prediction model
being executed on the server.
predict_pmml_batch( data, model_name, path = NULL, max_threads = NULL, max_records_per_thread = 5000, ... )
| data | Either a data frame or a path to a file that contain multiple data
records that are sent to Zementis Server for prediction. Files must
be |
|---|---|
| model_name | The name of the deployed PMML model that gets predictions
on the new data records contained in |
| path | Path to a file to which the response from Zementis Server is written to. Only mandatory
if compressed input files ( |
| max_threads | Maximum number of concurrent threads to process the data that is sent. Default value is twice the number of processor cores. |
| max_records_per_thread | Maximum number of records processed by a single thread. Default value is 5000. |
| ... | Additional arguments passed on to the underlying HTTP method.
This might be necessary if you need to set some curl options explicitly
via |
If data is a data frame, a .csv file or a .json file, a
list with the following components:
model A length one character vector containing the model_name
outputs A data frame containing the prediction results for data
If data is a compressed file (.zip), a compressed .json file saved
to path and an invisible 200 HTTP status code. If uncompressed and read into R,
the file saved to path will be a list with the 2 components described above.
For regression models outputs will include a 1-column data frame with
the predicted values.
For binary classification models outputs will include a 3-column
data frame that includes the probability of class 0, the probability of
class 1 and the classification class label result based on a 50% threshold.
When calling predict_pmml_batch() data is sent to Zementis Server using octet streams.
That means batch data is sent in stream mode and processing/scoring starts when the first chunk
of streams hits the server. By default, the server will process records in a batch size of
5000 records per thread and there will be a maximum of 2*n threads to process the entire batch
where n is the number of available cores on the machine.
Using the two function arguments max_threads and max_records_per_thread you can
modify the compute resources on the server for your data processing needs. max_threads lets you reserve
additional threads for your request (CPU resources). max_records_per_thread allows you to modify
the number of records processed by a single thread (memory resources).
if (FALSE) { # Predict the entire iris data set predict_pmml_batch(iris, "iris_model") # Predict the entire iris data set previously saved to a .json file jsonlite::write_json(iris, "iris.json") predict_pmml_batch("iris.json", "iris_model") # Predict the entire iris data set previously saved to a .csv file write.csv(iris, "iris.csv", row.names = FALSE) predict_pmml_batch("iris.csv","iris_model") # Predict the entire iris data set previously saved and compressed predict_pmml_batch("iris.csv.zip", "iris_model", "iris_predictions.zip") unzipped_predictions <- unzip("iris_predictions.zip") jsonlite::fromJSON(unzipped_predictions) }