部署多個模型
(1)直接部署兩個模型faster-rcnn與retina,構建代碼的文件夾。
文件夾結構為:
mutimodel
├── faster_rcnn
│ └── 1
│ ├── assets
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
├── model.config
├── retina
│ └── 1
│ ├── assets
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
model.config的內容為:
model_config_list {
config {
name: 'rcnn',
model_platform: "tensorflow",
base_path: '/models/mutimodel/faster_rcnn' # 這里的base_path是docker中的映射路徑
},
config {
name: 'retina',
model_platform: "tensorflow",
base_path: '/models/mutimodel/retina' # 這里的base_path是docker中的映射路徑
}
}
(2)啟動docker
sudo docker run -p 8501:8501 -p 8500:8500 --mount type=bind,source=/home/techi/techi/code/model_saved_files/mutimodel,target=/models/mutimodel -t tensorflow/serving --model_config_file=/models/mutimodel/model.config
其中,target為docker掛載地址,model_config_file對應的地址也是掛載地址
編寫客戶端代碼
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
options = [('grpc.max_send_message_length', 1000 * 1024 * 1024),
('grpc.max_receive_message_length', 1000 * 1024 * 1024)]
channel = grpc.insecure_channel('xxx', options=options)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
(_, _), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_test = tf.reshape(x_test, (10000, -1))
# print(x_test)
x_test = tf.cast(x_test, dtype=tf.float32)
x_test = (x_test - 127.5) / 127.5
request = predict_pb2.PredictRequest()
request.model_spec.name = "yoloV3"
request.model_spec.signature_name = 'serving_default'
request.inputs['input_1'].CopyFrom(tf.make_tensor_proto(x_test[:20], shape=(20, 784))) # 在遠程傳輸中,使用tensorproto格式進行傳輸
response = stub.Predict(request, 10.0)
output = tf.make_ndarray(response.outputs['dense_2']) # 這里的名稱使用 saved_model_cli show --dir='相對路徑/絕對路徑' --all 進行查看
output = tf.argmax(output, axis=-1)
print(output)
print(y_test[:20])
在docker hub中有四中鏡像。
:latest
:lasest-gpu
:lasest-devel
:lasest-devel-gpu
兩種划分共有4種鏡像。CPU、GPU、穩定版、開發版。穩定版在dockerfile中啟動tensorflow_model_server服務,創建容器即啟動服務。開發版需要在創建容器之后,進入容器,手動啟動tensorflow_model_server服務。
如果需要修改相關參數配置,可以通過如下命令:
docker run -p 8501:8501 -v /root/mutimodel/linear_model:/models/linear_model -t --entrypoint=tensorflow_model_server tensorflow/serving --port=8500 --model_name=linear_model --model_base_path=/models/linear_model --rest_api_port=8501
-t --entrypoint=tensorflow_model_server tensorflow/serving:如果使用穩定版的docker,啟動docker之后是不能進入容器內部bash環境的,–entrypoint的作用是允許你“間接”進入容器內部,然后調用tensorflow_model_server命令來啟動TensorFlow Serving,這樣才能輸入后面的參數。
tensorflow serving的詳細參數如下:
Flags:
--port=8500 int32 Port to listen on for gRPC API
--grpc_socket_path="" string If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
--rest_api_port=0 int32 Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
--rest_api_num_threads=16 int32 Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
--rest_api_timeout_in_ms=30000 int32 Timeout for HTTP/REST API calls.
--enable_batching=false bool enable batching
--batching_parameters_file="" string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
--model_config_file="" string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
--model_name="default" string name of model (ignored if --model_config_file flag is set)
--model_base_path="" string path to export (ignored if --model_config_file flag is set, otherwise required)
--max_num_load_retries=5 int32 maximum number of times it retries loading a model after the first failure, before giving up. If set to 0, a load is attempted only once. Default: 5
--load_retry_interval_micros=60000000 int64 The interval, in microseconds, between each servable load retry. If set negative, it doesn't wait. Default: 1 minute
--file_system_poll_wait_seconds=1 int32 Interval in seconds between each poll of the filesystem for new model version. If set to zero poll will be exactly done once and not periodically. Setting this to negative value will disable polling entirely causing ModelServer to indefinitely wait for a new model at startup. Negative values are reserved for testing purposes only.
--flush_filesystem_caches=true bool If true (the default), filesystem caches will be flushed after the initial load of all servables, and after each subsequent individual servable reload (if the number of load threads is 1). This reduces memory consumption of the model server, at the potential cost of cache misses if model files are accessed after servables are loaded.
--tensorflow_session_parallelism=0 int64 Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--tensorflow_intra_op_parallelism=0 int64 Number of threads to use to parallelize the executionof an individual op. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--tensorflow_inter_op_parallelism=0 int64 Controls the number of operators that can be executed simultaneously. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--ssl_config_file="" string If non-empty, read an ascii SSLConfig protobuf from the supplied file name and set up a secure gRPC channel
--platform_config_file="" string If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)
--per_process_gpu_memory_fraction=0.000000 float Fraction that each process occupies of the GPU memory space the value is between 0.0 and 1.0 (with 0.0 as the default) If 1.0, the server will allocate all the memory when the server starts, If 0.0, Tensorflow will automatically select a value.
--saved_model_tags="serve" string Comma-separated set of tags corresponding to the meta graph def to load from SavedModel.
--grpc_channel_arguments="" string A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000)
--enable_model_warmup=true bool Enables model warmup, which triggers lazy initializations (such as TF optimizations) at load time, to reduce first request latency.
--version=false bool Display version
關於設置別名:
Please note that labels can only be assigned to model versions that are already loaded and available for serving. Once a model version is available, one may reload the model config on the fly to assign a label to it. This can be achieved using aHandleReloadConfigRequest RPC or if the server is set up to periodically poll the filesystem for the config file, as described above.
If you would like to assign a label to a version that is not yet loaded (for ex. by supplying both the model version and the label at startup time) then you must set the --allow_version_labels_for_unavailable_models flag to true, which allows new labels to be assigned to model versions that are not loaded yet.
docker run -p 8501:8501 -p 8500:8500 -v /root/mutimodel/:/models/mutimodel -t tensorflow/serving --model_config_file=/models/mutimodel/model.config --allow_version_labels_for_unavailable_models=true