4 腳本語言
This chapter provides a brief overview of scripting language extension programming and the mechanisms by which scripting language interpreters access C and C++ code.
本章簡要概述了腳本語言擴展編程,以及腳本語言解釋器訪問 C 和 C++ 代碼的機制。
4.1 兩種語言的概覽
When a scripting language is used to control a C program, the resulting system tends to look as follows:
當使用腳本語言來控制 C 程序時,生成的系統往往如下所示:
In this programming model, the scripting language interpreter is used for high level control whereas the underlying functionality of the C/C++ program is accessed through special scripting language "commands." If you have ever tried to write your own simple command interpreter, you might view the scripting language approach to be a highly advanced implementation of that. Likewise, If you have ever used a package such as MATLAB or IDL, it is a very similar model--the interpreter executes user commands and scripts. However, most of the underlying functionality is written in a low-level language like C or Fortran.
The two-language model of computing is extremely powerful because it exploits the strengths of each language. C/C++ can be used for maximal performance and complicated systems programming tasks. Scripting languages can be used for rapid prototyping, interactive debugging, scripting, and access to high-level data structures such associative arrays.
在此編程模型中,腳本語言解釋器用於高級控制,而 C/C++ 程序的基礎功能通過特殊腳本語言“命令”訪問。如果你曾嘗試編寫自己的簡單命令解釋器,則可能會將腳本語言方法視為其高級實現。同樣,如果你曾經使用過像 MATLAB 或 IDL 這樣的軟件包,它就是一個非常相似的模型——解釋器執行用戶命令和腳本。但是,大多數底層功能都是用 C 或 Fortran 等低級語言編寫的。
雙語計算模型非常強大,因為它充分利用了每種語言的優勢。C/C++ 可用於最大化性能和復雜系統編程任務。腳本語言可用於快速原型設計、交互式調試、腳本編寫,以及對高級數據結構(如關聯數組)的訪問。
4.2 腳本語言如何調用 C?
Scripting languages are built around a parser that knows how to execute commands and scripts. Within this parser, there is a mechanism for executing commands and accessing variables. Normally, this is used to implement the builtin features of the language. However, by extending the interpreter, it is usually possible to add new commands and variables. To do this, most languages define a special API for adding new commands. Furthermore, a special foreign function interface defines how these new commands are supposed to hook into the interpreter.
Typically, when you add a new command to a scripting interpreter you need to do two things; first you need to write a special "wrapper" function that serves as the glue between the interpreter and the underlying C function. Then you need to give the interpreter information about the wrapper by providing details about the name of the function, arguments, and so forth. The next few sections illustrate the process.
腳本語言圍繞一個知道如何執行命令和腳本的解析器構建。在此解析器中,有一種執行命令和訪問變量的機制。通常,這用於實現語言的內置功能。但是,通過擴展解釋器,通常可以添加新的命令和變量。為此,大多數語言都定義了一個用於添加新命令的特殊 API。此外,一個特殊的外部函數接口定義了這些新命令應該如何掛鈎到解釋器中。
通常,當你向腳本解釋器添加新命令時,你需要做兩件事。首先,你需要編寫一個特殊的“包裝器”函數,該函數充當解釋器和底層 C 函數之間的粘合劑。然后,你需要通過提供有關函數名稱、參數等的詳細信息,為解釋器提供有關包裝器的信息。接下來的幾節將說明這一過程。
4.2.1 包裝器函數
Suppose you have an ordinary C function like this :
假定你的初始 C 函數如下:
int fact(int n) {
if (n <= 1)
return 1;
else
return n*fact(n-1);
}
In order to access this function from a scripting language, it is necessary to write a special "wrapper" function that serves as the glue between the scripting language and the underlying C function. A wrapper function must do three things :
- Gather function arguments and make sure they are valid.
- Call the C function.
- Convert the return value into a form recognized by the scripting language.
As an example, the Tcl wrapper function for the fact()
function above example might look like the following :
為了從腳本語言訪問此函數,有必要編寫一個特殊的“包裝器”函數,作為腳本語言和底層 C 函數之間的粘合劑。包裝函數必須做三件事:
- 收集函數參數並確保它們有效。
- 調用 C 函數。
- 將返回值轉換為腳本語言識別的形式。
舉個例子,上面例子中
fact()
函數的 Tcl 包裝器函數可能如下所示:
int wrap_fact(ClientData clientData, Tcl_Interp *interp, int argc, char *argv[]) {
int result;
int arg0;
if (argc != 2) {
interp->result = "wrong # args";
return TCL_ERROR;
}
arg0 = atoi(argv[1]);
result = fact(arg0);
sprintf(interp->result, "%d", result);
return TCL_OK;
}
Once you have created a wrapper function, the final step is to tell the scripting language about the new function. This is usually done in an initialization function called by the language when the module is loaded. For example, adding the above function to the Tcl interpreter requires code like the following :
一旦創建了包裝函數,最后一步就是告訴腳本語言有關新函數的信息。這通常在加載模塊時由語言調用的初始化函數中完成。例如,將上述函數添加到 Tcl 解釋器需要如下代碼:
int Wrap_Init(Tcl_Interp *interp) {
Tcl_CreateCommand(interp, "fact", wrap_fact, (ClientData) NULL,
(Tcl_CmdDeleteProc *) NULL);
return TCL_OK;
}
When executed, Tcl will now have a new command called "fact
" that you can use like any other Tcl command.
Although the process of adding a new function to Tcl has been illustrated, the procedure is almost identical for Perl and Python. Both require special wrappers to be written and both need additional initialization code. Only the specific details are different.
執行時,Tcl 將有一個名為
fact
的新命令,你可以像使用任何其他 Tcl 命令一樣使用它。雖然只說明了向 Tcl 添加新函數的過程,但 Perl 和 Python 的過程幾乎相同。兩者都需要編寫特殊的包裝器,並且都需要額外的初始化代碼。只有具體細節不同。
4.2.2 變量鏈接
Variable linking refers to the problem of mapping a C/C++ global variable to a variable in the scripting language interpreter. For example, suppose you had the following variable:
變量鏈接指的是將 C/C++ 全局變量映射到腳本語言解釋器中變量的問題。例如,假設你有以下變量:
double Foo = 3.5;
It might be nice to access it from a script as follows (shown for Perl):
以如下所示從腳本中訪問它看起來挺不錯(顯示為 Perl):
$a = $Foo * 2.3; # Evaluation
$Foo = $a + 2.0; # Assignment
To provide such access, variables are commonly manipulated using a pair of get/set functions. For example, whenever the value of a variable is read, a "get" function is invoked. Similarly, whenever the value of a variable is changed, a "set" function is called.
In many languages, calls to the get/set functions can be attached to evaluation and assignment operators. Therefore, evaluating a variable such as $Foo
might implicitly call the get function. Similarly, typing $Foo = 4
would call the underlying set function to change the value.
為了提供這種訪問,通常使用一對 get/set 函數來操縱變量。例如,每當讀取變量的值時,就會調用“get”函數。類似地,只要改變變量的值,就會調用“set”函數。
在許多語言中,對 get/set 函數的調用可以附加到求值和賦值運算符。因此,評估諸如
$Foo
之類的變量可能會隱式調用 get 函數。類似地,鍵入$Foo = 4
將調用底層 set 函數來更改值。
4.2.3 常量
In many cases, a C program or library may define a large collection of constants. For example:
在許多情況下,C 程序或庫可以定義大量常量。例如:
#define RED 0xff0000
#define BLUE 0x0000ff
#define GREEN 0x00ff00
To make constants available, their values can be stored in scripting language variables such as $RED
, $BLUE
, and $GREEN
. Virtually all scripting languages provide C functions for creating variables so installing constants is usually a trivial exercise.
要使常量可用,它們的值可以存儲在腳本語言變量中,例如
$RED
,$BLUE
和$GREEN
。實際上,所有腳本語言都提供了用於創建變量的 C 函數,因此放置常量通常不是一個問題。
4.2.4 結構體與類
Although scripting languages have no trouble accessing simple functions and variables, accessing C/C++ structures and classes present a different problem. This is because the implementation of structures is largely related to the problem of data representation and layout. Furthermore, certain language features are difficult to map to an interpreter. For instance, what does C++ inheritance mean in a Perl interface?
The most straightforward technique for handling structures is to implement a collection of accessor functions that hide the underlying representation of a structure. For example,
雖然腳本語言在訪問簡單函數和變量時沒有問題,但訪問 C/C++ 結構體和類會帶來不同的問題。這是因為結構體的實現主要與數據表示和布局問題有關。此外,某些語言特征難以映射到解釋器。例如,C++ 繼承在 Perl 接口中對應着什么?
處理結構體最直接的技術是實現一個訪問器函數的集合以隱藏結構的底層表示。例如,
struct Vector {
Vector();
~Vector();
double x, y, z;
};
can be transformed into the following set of functions :
可以轉換為以下一組函數:
Vector *new_Vector();
void delete_Vector(Vector *v);
double Vector_x_get(Vector *v);
double Vector_y_get(Vector *v);
double Vector_z_get(Vector *v);
void Vector_x_set(Vector *v, double x);
void Vector_y_set(Vector *v, double y);
void Vector_z_set(Vector *v, double z);
Now, from an interpreter these function might be used as follows:
現在,可以從解釋器中使用這些函數,如下所示:
% set v [new_Vector]
% Vector_x_set $v 3.5
% Vector_y_get $v
% delete_Vector $v
% ...
Since accessor functions provide a mechanism for accessing the internals of an object, the interpreter does not need to know anything about the actual representation of a Vector
.
由於訪問器函數提供了訪問對象內部的機制,因此解釋器不需要知道關於
Vector
的實際表示的任何信息。
4.2.5 代理類
In certain cases, it is possible to use the low-level accessor functions to create a proxy class, also known as a shadow class. A proxy class is a special kind of object that gets created in a scripting language to access a C/C++ class (or struct) in a way that looks like the original structure (that is, it proxies the real C++ class). For example, if you have the following C++ definition :
在某些情況下,可以使用低級訪問器函數來創建代理類,也稱為影子類。代理類是一種特殊類型的對象,它以腳本語言創建,以一種看起來像原始結構體的方式訪問 C/C++ 類(或結構體)(即它代理真正的 C++ 類)。例如,如果你有以下 C++ 定義:
class Vector {
public:
Vector();
~Vector();
double x, y, z;
};
A proxy classing mechanism would allow you to access the structure in a more natural manner from the interpreter. For example, in Python, you might want to do this:
代理分類機制允許你以更自然的方式從解釋器訪問結構體。例如,在 Python 中,你可能希望這樣做:
>>> v = Vector()
>>> v.x = 3
>>> v.y = 4
>>> v.z = -13
>>> ...
>>> del v
Similarly, in Perl5 you may want the interface to work like this:
同樣,在 Perl5 中,你可能希望接口像這樣工作:
$v = new Vector;
$v->{x} = 3;
$v->{y} = 4;
$v->{z} = -13;
Finally, in Tcl :
最后是在 Tcl 中:
Vector v
v configure -x 3 -y 4 -z -13
When proxy classes are used, two objects are really at work--one in the scripting language, and an underlying C/C++ object. Operations affect both objects equally and for all practical purposes, it appears as if you are simply manipulating a C/C++ object.
當使用代理類時,有兩個對象實際在起作用——一個在腳本語言中,另一個在底層的 C/C++ 對象中。操作同等地影響兩個對象,以及所有實際目的,看起來好像只是在操作 C/C++ 對象。
4.3 構建腳本擴展
The final step in using a scripting language with your C/C++ application is adding your extensions to the scripting language itself. There are two primary approaches for doing this. The preferred technique is to build a dynamically loadable extension in the form of a shared library. Alternatively, you can recompile the scripting language interpreter with your extensions added to it.
在 C/C++ 應用程序中使用腳本語言的最后一步是向腳本語言本身添加擴展。這有兩種主要方法。首選技術是以動態庫的形式構建可動態加載的擴展。或者,你可以重新編譯腳本語言解釋器並添加擴展。
4.3.1 動態庫與動態加載
To create a shared library or DLL, you often need to look at the manual pages for your compiler and linker. However, the procedure for a few common platforms is shown below:
要創建動態庫或 DLL,通常需要查看編譯器和鏈接器的手冊。但是,一些常見系統的過程如下所示:
# Build a shared library for Solaris
gcc -fpic -c example.c example_wrap.c -I/usr/local/include
ld -G example.o example_wrap.o -o example.so
# Build a shared library for Linux
gcc -fpic -c example.c example_wrap.c -I/usr/local/include
gcc -shared example.o example_wrap.o -o example.so
To use your shared library, you simply use the corresponding command in the scripting language (load, import, use, etc...). This will import your module and allow you to start using it. For example:
要使用動態庫,只需使用腳本語言中的相應命令(
load
、import
、use
等)。這將導入你的模塊並允許你開始使用它。例如:
% load ./example.so
% fact 4
24
%
When working with C++ codes, the process of building shared libraries may be more complicated--primarily due to the fact that C++ modules may need additional code in order to operate correctly. On many machines, you can build a shared C++ module by following the above procedures, but changing the link line to the following :
使用 C++ 代碼時,構建動態庫的過程可能會更復雜——主要是因為 C++ 模塊可能需要額外的代碼才能正常運行。在許多機器上,你可以按照上述過程構建共享 C++ 模塊,但將鏈接行更改為以下內容:
c++ -shared example.o example_wrap.o -o example.so
4.3.2 鏈接動態庫
When building extensions as shared libraries, it is not uncommon for your extension to rely upon other shared libraries on your machine. In order for the extension to work, it needs to be able to find all of these libraries at run-time. Otherwise, you may get an error such as the following :
將擴展構建為動態庫時,擴展依賴於計算機上的其他動態庫的情況並不罕見。為了使擴展能夠工作,它需要能夠在運行時找到所有這些庫。否則,你可能會收到如下錯誤:
>>> import graph
Traceback (innermost last):
File "<stdin>", line 1, in ?
File "/home/sci/data1/beazley/graph/graph.py", line 2, in ?
import graphc
ImportError: 1101:/home/sci/data1/beazley/bin/python: rld: Fatal Error: cannot
successfully map soname 'libgraph.so' under any of the filenames /usr/lib/libgraph.so:/
lib/libgraph.so:/lib/cmplrs/cc/libgraph.so:/usr/lib/cmplrs/cc/libgraph.so:
>>>
What this error means is that the extension module created by SWIG depends upon a shared library called "libgraph.so
" that the system was unable to locate. To fix this problem, there are a few approaches you can take.
- Link your extension and explicitly tell the linker where the required libraries are located. Often times, this can be done with a special linker flag such as
-R
,-rpath
, etc. This is not implemented in a standard manner so read the man pages for your linker to find out more about how to set the search path for shared libraries. - Put shared libraries in the same directory as the executable. This technique is sometimes required for correct operation on non-Unix platforms.
- Set the UNIX environment variable
LD_LIBRARY_PATH
to the directory where shared libraries are located before running Python. Although this is an easy solution, it is not recommended. Consider setting the path using linker options instead.
這個錯誤意味着 SWIG 創建的擴展模塊所依賴的名為
libgraph.so
的動態庫在系統中無法找到。要解決此問題,你可以采取一些方法。
- 鏈接你的擴展並明確告訴鏈接器所需庫所在的位置。通常,這可以使用特殊的鏈接器標志來完成,例如
-R
、-rpath
等。這不是以標准方式實現的,因此請閱讀鏈接器的手冊以了解更多有關如何設置動態庫搜索路徑的信息。- 將動態庫放在與可執行文件相同的目錄中。在非 Unix 平台上的正確操作有時需要此技術。
- 在運行 Python 之前,將 UNIX 環境變量
LD_LIBRARY_PATH
設置為動態庫所在的目錄。雖然這是一個簡單的解決方案,但不建議這樣做。請考慮使用鏈接器選項設置路徑。
4.3.3 靜態鏈接
With static linking, you rebuild the scripting language interpreter with extensions. The process usually involves compiling a short main program that adds your customized commands to the language and starts the interpreter. You then link your program with a library to produce a new scripting language executable.
Although static linking is supported on all platforms, this is not the preferred technique for building scripting language extensions. In fact, there are very few practical reasons for doing this--consider using shared libraries instead.
使用靜態鏈接,你可以使用擴展來重建腳本語言解釋器。該過程通常涉及編譯一個簡短的主程序,該程序將自定義命令添加到語言中並啟動解釋程序。然后,將程序與庫鏈接以生成新的腳本語言可執行文件。
雖然所有平台都支持靜態鏈接,但這不是構建腳本語言擴展的首選技術。實際上,這樣做的實際理由很少——請考慮使用動態庫。