R/predict_pmml_batch.R
predict_pmml_batch.Rd
predict_pmml_batch()
returns the predictions for multiple input records
that are sent to Zementis Server. The values returned depend on the type of prediction model
being executed on the server.
predict_pmml_batch( data, model_name, path = NULL, max_threads = NULL, max_records_per_thread = 5000, ... )
data | Either a data frame or a path to a file that contain multiple data
records that are sent to Zementis Server for prediction. Files must
be |
---|---|
model_name | The name of the deployed PMML model that gets predictions
on the new data records contained in |
path | Path to a file to which the response from Zementis Server is written to. Only mandatory
if compressed input files ( |
max_threads | Maximum number of concurrent threads to process the data that is sent. Default value is twice the number of processor cores. |
max_records_per_thread | Maximum number of records processed by a single thread. Default value is 5000. |
... | Additional arguments passed on to the underlying HTTP method.
This might be necessary if you need to set some curl options explicitly
via |
If data
is a data frame, a .csv
file or a .json
file, a
list with the following components:
model
A length one character vector containing the model_name
outputs
A data frame containing the prediction results for data
If data
is a compressed file (.zip
), a compressed .json
file saved
to path
and an invisible 200 HTTP status code. If uncompressed and read into R,
the file saved to path
will be a list with the 2 components described above.
For regression models outputs
will include a 1-column data frame with
the predicted values.
For binary classification models outputs
will include a 3-column
data frame that includes the probability of class 0, the probability of
class 1 and the classification class label result based on a 50% threshold.
When calling predict_pmml_batch()
data is sent to Zementis Server using octet streams.
That means batch data is sent in stream mode and processing/scoring starts when the first chunk
of streams hits the server. By default, the server will process records in a batch size of
5000 records per thread and there will be a maximum of 2*n threads to process the entire batch
where n is the number of available cores on the machine.
Using the two function arguments max_threads
and max_records_per_thread
you can
modify the compute resources on the server for your data processing needs. max_threads
lets you reserve
additional threads for your request (CPU resources). max_records_per_thread
allows you to modify
the number of records processed by a single thread (memory resources).
if (FALSE) { # Predict the entire iris data set predict_pmml_batch(iris, "iris_model") # Predict the entire iris data set previously saved to a .json file jsonlite::write_json(iris, "iris.json") predict_pmml_batch("iris.json", "iris_model") # Predict the entire iris data set previously saved to a .csv file write.csv(iris, "iris.csv", row.names = FALSE) predict_pmml_batch("iris.csv","iris_model") # Predict the entire iris data set previously saved and compressed predict_pmml_batch("iris.csv.zip", "iris_model", "iris_predictions.zip") unzipped_predictions <- unzip("iris_predictions.zip") jsonlite::fromJSON(unzipped_predictions) }