PyTables 教程（一）入門，瀏覽對象樹，將數據提交到表和數組

本文轉載自查看原文 2021-11-27 11:02 1229 PyTables

翻譯自http://www.pytables.org/usersguide/tutorials.html

教程

Seràs la clau que obre tots els panys, seràs la llum, la llum il.limitada, seràs confí on l’aurora comença, seràs forment, escala il.luminada!

—Lyrics: Vicent Andrés i Estellés. Music: Ovidi Montllor, Toti Soler, M’aclame a tu

本章包括了一系列簡單而全面的教程，使您能夠理解PyTables的主要功能。如果您想了解有關某個特定實例變量、全局函數或方法的更多信息，請查看文檔字符串或轉到Library Reference中的庫引用。

請注意，在本文檔中，如下術語將混用：column和field（列和字段），row和record（行和記錄）。

入門

本節主要介紹如何用Python定義記錄，並將它們的集合（即表table）保存到文件中。然后，我們將使用Python cuts選擇表中的一些數據，並創建NumPy數組以將此選擇存儲為樹中的單獨對象。

在examples/tutorial1-1.py中，您將找到本節中所有代碼。盡管如此，編寫本教程系列的目的是讓您能夠在Python交互控制台中重現它。我鼓勵您在本教程中進行測試！

1. 導入tables對象

開始之前，需要導入tables包中的公共對象：

 
              >>> import tables  
             

如果您不想污染命名空間，這是導入表的推薦方法。但是，由於PyTables 包含了一組第一級原語(first-level primitives)，因此可以考慮使用替代方法：

 
              >>> from tables import *  
             

如果要使用 NumPy 數組（通常情況下），您還需要從 numpy 包中導入函數。所以大多數 PyTables程序以如下語句開頭：

 
              >>> import tables # but in this tutorial we use "from tables import \*" >>> import numpy as np  
             

2. 定義列描述符（對象類）

假設有一個粒子探測器的數據，我們想要創建一個表對象來保存它得到的數據。

首先需要定義表、列數、每列中包含數據類型等。

粒子探測器包含的數據如下：
動態范圍為 8 位的 TDC（時間到數字轉換器）計數器：定義變量TDCcount ；
16 位的 ADC（模擬到數字轉換器）：定義變量ADCcount；
粒子的網格位置：定義變量grid_i 和 grid_j ；
粒子的壓力（單精度）：
粒子的和能量（雙精度）：
粒子的種類：name 16位字符串
存儲粒子的編號：idnumber 64位整數

確定了列及其類型后，現在聲明一個新的Particle類，該類將包含所有這些信息：

 
              >>> from tables import * >>> class Particle(IsDescription): ... name = StringCol(16) # 16-character String ... idnumber = Int64Col() # Signed 64-bit integer ... ADCcount = UInt16Col() # Unsigned short integer ... TDCcount = UInt8Col() # unsigned byte ... grid_i = Int32Col() # 32-bit integer ... grid_j = Int32Col() # 32-bit integer ... pressure = Float32Col() # float (single-precision) ... energy = Float64Col() # double (double-precision) >>>  
             

這個類的定義是不言自明的。基本上，您需要為每個字段聲明一個類變量。根據定義的列類型（數據類型、長度、形狀等），將相應的Col子類的實例指定值。有關這些子類的完整描述，請參見The Col class and its descendants。有關Col構造函數支持的數據類型列表，請參見Supported data types in PyTables。

現在，我們可以使用Particle實例作為探測器數據表的描述符。首先，我們要創建一個文件，以便存儲這些數據。

3. 創建PyTables文件

使用頂層的 open_file()函數創建 PyTables 文件：

 
              >>> h5file = open_file("tutorial1.h5", mode="w", title="Test file")  
             

上述語句表示，在當前目錄中下使用“w”rite模式創建一個名為“tutorial1.h5”的文件，並帶有一個描述性標題字符串（“Test file”）。open_file() 函數是由`from tables import *` 語句導入的對象之一。此函數嘗試打開文件，如果成功，則返回文件（請參見The File Class）對象實例。

4. 創建一個新的組

對文件h5file，調用File.create_group()方法，創建一個分支於“/”，名稱為detector 的組（參見 The Group class）。組的名稱為group：

 
              >>> group = h5file.create_group("/", 'detector', 'Detector information') 
             

5. 創建一個新的表

現在，調用 h5file 對象的File.create_table() 方法，在group下創建一個節點名為“readout”的表（參見The Table class），表的類型為Particle類，表的標題為“Readout example”。這個表用table表示。

 
              >>> table = h5file.create_table(group, 'readout', Particle, "Readout example")  
             

檢查輸出：

 
              >>> print(h5file) tutorial1.h5 (File) 'Test file' Last modif.: 'Wed Mar 7 11:06:12 2007' Object Tree: / (RootGroup) 'Test file' /detector (Group) 'Detector information' /detector/readout (Table(0,)) 'Readout example'  
             

如您所見，很容易看到我們剛剛創建的組和表對象。如果需要更多信息，只需鍵入包含文件實例的變量：

 
              >>> h5file File(filename='tutorial1.h5', title='Test file', mode='w', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False)) / (RootGroup) 'Test file' /detector (Group) 'Detector information' /detector/readout (Table(0,)) 'Readout example' description := {  "ADCcount": UInt16Col(shape=(), dflt=0, pos=0),  "TDCcount": UInt8Col(shape=(), dflt=0, pos=1),  "energy": Float64Col(shape=(), dflt=0.0, pos=2),  "grid_i": Int32Col(shape=(), dflt=0, pos=3),  "grid_j": Int32Col(shape=(), dflt=0, pos=4),  "idnumber": Int64Col(shape=(), dflt=0, pos=5),  "name": StringCol(itemsize=16, shape=(), dflt='', pos=6),  "pressure": Float32Col(shape=(), dflt=0.0, pos=7)}  byteorder := 'little'  chunkshape := (87,)  
             

將顯示有關樹中每個對象的更詳細信息。請注意，我們的表類型Particle是作為readout表描述信息的一部分打印的。

用一些值填充這個表。首先，獲得一個指向表中各行（參見 The Row class）的指針：

 
              >>> particle = table.row  
             

table的row屬性指向用於表中的行。我們只需將每一行的值分配給row，就像它是一個字典一樣（盡管它實際上是一個擴展類），並使用列名作為鍵來編寫數據。

下面是如何寫入行的示例：

 
              >>> for i in range(10): ... particle['name'] = f'Particle: {i:6d}' ... particle['TDCcount'] = i % 256 ... particle['ADCcount'] = (i * 256) % (1 << 16) ... particle['grid_i'] = i ... particle['grid_j'] = 10 - i ... particle['pressure'] = float(i*i) ... particle['energy'] = float(particle['pressure'] ** 4) ... particle['idnumber'] = i * (2 ** 34) ... # Insert a new particle record ... particle.append() >>>  
             

循環中各行只是將值賦給行particle中的各列（請參見The Row class）。調用append()方法會將此信息寫入表I/O緩沖區。

處理完所有數據后，需要調用table.flush()方法，刷新表的I/O緩沖區，從而將這些數據寫入磁盤：

 
              >>> table.flush()  
             

刷新表是一個非常重要的步驟，它不僅有助於保持文件的完整性，還將釋放內存資源（即內部緩沖區）。

6. 讀取（和選擇）表中的數據

現在我們需要訪問它並從特定列中選擇我們感興趣的值，參見下面的示例：

 
                >>> table = h5file.root.detector.readout
>>> pressure = [x['pressure'] for x in table.iterrows() if x['TDCcount'] > 3 and 20 <= x['pressure'] < 50]
>>> pressure
[25.0, 36.0, 49.0]  
               

第一行，創建了一個指向readout表的快捷方式。此處是使用自然命名的方式訪問這個表。我們也可以使用 h5file.get_node() 方法，我們稍后會這樣做。

后兩行代碼為 Python的列表推導特性。它們是通過Table.iterrows()迭代器循環遍歷table中的行。這些行使用如下條件篩選：

 
              x['TDCcount'] > 3 and 20 <= x['pressure'] < 50  
             

所以，我們從篩選的記錄中選取了'pressure'列，從而創建最終的列表並將它賦值給pressure變量。

也可以使用常規的for循環實現同樣目的，但是，采用復雜的語句可以變得更加緊湊和優雅。

PyTables還提供其他更強大的執行選擇的方法，如果你的表非常大，或者需要非常高的查詢速度，這些方法可能更合適。這些方法被叫做嵌入內核(in-kernel )和索引化（indexed ) 的查詢，它們可以通過Table.where() 和其他相關方法實現。

我們使用一個嵌入內核的選擇方法來實現前面相同的選取：

 
              >>> names = [ x['name'] for x in table.where("""(TDCcount > 3) & (20 <= pressure) & (pressure < 50)""") ] >>> names ['Particle: 5', 'Particle: 6', 'Particle: 7']  
             

嵌入內核和索引的查詢不僅速度更快，而且看起來寫法更加緊湊，他們是PyTables眾多特性中最強大的特性之一，所以盡肯能多地使用它們。關於嵌入內核和索引化選擇，參見 Condition Syntax 和 Accelerating your searches 。

注意

當查詢條件包括字符串文字時，應特別小心。實際上，Python2的字符串文字是字節字符串，而Python3字符串是unicode對象(文本字符串)。

關於上述Particle定義，必須注意的是，“name”列的類型不會因使用的Python版本而改變（當然），它始終對應於字節字符串。

所以，任何涉及“name”列的條件都應該使用適當類型的來編寫，從而避免TypeError。

假設我們需要得到的對應於particle name的特定的行。下面的代碼對Python2起作用，但對Python3會返回TypeError：

 
               >>> condition = '(name == "Particle: 5") | (name == "Particle: 7")' >>> for record in table.where(condition): # TypeError in Python3 ... # do something with "record"  
              

原因就是對於Python 3，“condition” 表示字節字符串（“name”列的內容）和一個unicode文本的比較。

正確的方式是：

 
               >>> condition = '(name == b"Particle: 5") | (name == b"Particle: 7")'  
              

下一節將向您展示如何將這些選定結果保存到文件中。

7. 創建新的數組對象

為了將所選數據與大量的檢測器數據分離，我們將創建一個新的組列，該組列從root組分支。然后，在該組下，我們將創建兩個包含所選數據的數組。首先，我們創建一個組：

 
              >>> gcolumns = h5file.create_group(h5file.root, "columns", "Pressure and Name")  
             

請注意，這次我們使用自然命名 (h5file.root) 而不是絕對路徑字符串 (“/”) 來指定第一個參數。

現在，創建我們剛才提到的兩個數組對象中的第一個：

 
              >>> h5file.create_array(gcolumns, 'pressure', np.array(pressure), "Pressure column selection") /columns/pressure (Array(3,)) 'Pressure column selection'  atom := Float64Atom(shape=(), dflt=0.0)  maindim := 0  flavor := 'numpy'  byteorder := 'little'  chunkshape := None  
             

我們已經知道 File.create_array()方法的前兩個參數（與create_table中的前兩個相同）：它們是創建數組的父組和數組實例名稱。第三個參數是要保存到磁盤的對象。在本例中，它是一個NumPy數組，它是根據我們之前創建的選擇列表構建的。第四個參數是標題。

現在，我們將保存第二個數組。它包含我們之前選擇的字符串列表：我們按原樣保存此對象，無需進一步轉換：

 
              >>> h5file.create_array(gcolumns, 'name', names, "Name column selection") /columns/name (Array(3,)) 'Name column selection'  atom := StringAtom(itemsize=16, shape=(), dflt='')  maindim := 0  flavor := 'python'  byteorder := 'irrelevant'  chunkshape := None  
             

如您所見， File.create_array()允許使用names名稱（這是一個常規Python列表）作為對象參數。實際上，它接受各種不同常規類型的對象（參見create_array()）作為參數。flavor屬性（參見上面的輸出）保存的了對象的原始類型。基於這個flavor，PyTables稍后將能夠從磁盤檢索同一的對象。

 
              >>> print(h5file) tutorial1.h5 (File) 'Test file' Last modif.: 'Wed Mar 7 19:40:44 2007' Object Tree: / (RootGroup) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10,)) 'Readout example'  
             

8. 關閉文件並查看其內容

最后，我們使用close方法在退出Python之前關閉文件：

 
              >>> h5file.close() >>> ^D $  
             

您現在已經創建了您的第一個 PyTables 文件，其中包含一個表和兩個數組。您可以使用任何通用 HDF5 工具（例如 h5dump 或 h5ls）檢查它。這是使用 h5ls 程序讀取的 tutorial1.h5 的樣子。

 
              $ h5ls -rd tutorial1.h5
/columns                 Group
/columns/name            Dataset {3} Data: (0) "Particle: 5", "Particle: 6", "Particle: 7" /columns/pressure Dataset {3} Data: (0) 25, 36, 49 /detector Group /detector/readout Dataset {10/Inf} Data: (0) {0, 0, 0, 0, 10, 0, "Particle: 0", 0}, (1) {256, 1, 1, 1, 9, 17179869184, "Particle: 1", 1}, (2) {512, 2, 256, 2, 8, 34359738368, "Particle: 2", 4}, (3) {768, 3, 6561, 3, 7, 51539607552, "Particle: 3", 9}, (4) {1024, 4, 65536, 4, 6, 68719476736, "Particle: 4", 16}, (5) {1280, 5, 390625, 5, 5, 85899345920, "Particle: 5", 25}, (6) {1536, 6, 1679616, 6, 4, 103079215104, "Particle: 6", 36}, (7) {1792, 7, 5764801, 7, 3, 120259084288, "Particle: 7", 49}, (8) {2048, 8, 16777216, 8, 2, 137438953472, "Particle: 8", 64}, (9) {2304, 9, 43046721, 9, 1, 154618822656, "Particle: 9", 81}  
             

以下是“ptdump”PyTables實用程序（位於utils/目錄中）顯示的輸出。

 
              $ ptdump tutorial1.h5
/ (RootGroup) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10,)) 'Readout example'

如果您想要更詳細的信息，可以將 -v 或 -d 選項傳遞給 ptdump。試試看！

此外，在Figure 1中，您使用 ViTables圖形界面來查看tutorial1.h5文件。

Figure 1. 圖 1. 教程 1 的數據文件的初始版本，帶有數據對象的視圖。

瀏覽對象樹

在本節中，我們將學習如何瀏覽樹並檢索數據以及有關實際數據的元信息。

在examples/tutorial1-2.py 中，您將找到本節中所有代碼的工作版本。和以前一樣，我們鼓勵您在教程過程中使用 python shell 並檢查對象樹。

1. 遍歷對象樹

讓我們先打開上一節教程中創建的文件：

 
              >>> h5file = open_file("tutorial1.h5", "a")  
             

這一次，我們以“a”ppend 模式打開了文件。我們使用這種模式向文件添加更多信息。

PyTables 遵循 Python 傳統，提供強大的內省功能，即您可以輕松地查詢有關對象樹的任何組件的信息以及搜索樹。

首先，您可以通過簡單地打印現有的 File 實例來初步了解對象樹：

 
              >>> print(h5file) tutorial1.h5 (File) 'Test file' Last modif.: 'Wed Mar 7 19:50:57 2007' Object Tree: / (RootGroup) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10,)) 'Readout example'  
             

看起來我們所有的對象都在那里。現在讓我們使用 File 迭代器來看看如何列出對象樹中的所有節點：

 
              >>> for node in h5file: ... print(node) / (RootGroup) 'Test file' /columns (Group) 'Pressure and Name' /detector (Group) 'Detector information' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector/readout (Table(10,)) 'Readout example'  
             

我們可以使用 File 類的 File.walk_groups()方法來僅列出樹上的組：

 
              >>> for group in h5file.walk_groups(): ... print(group) / (RootGroup) 'Test file' /columns (Group) 'Pressure and Name' /detector (Group) 'Detector information'  
             

請注意，File.walk_groups()實際上返回一個迭代器，而不是對象列表。將此迭代器與list_nodes() 方法結合使用是一個強大的組合。讓我們看一個樹中所有數組的示例列表：

 
              >>> for group in h5file.walk_groups("/"): ... for array in h5file.list_nodes(group, classname='Array'): ... print(array) /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'  
             

File.list_nodes() 返回一個列表，其中包含掛起特定Group的所有節點。如果指定了classname關鍵字，該方法將過濾掉所有不是該類后代的實例。在此例中，我們只選取了用Array實例。此外，還有一個名為File.iter_nodes()的迭代器對應項，在某些情況下（如在處理后面有大量節點的組時）可能很方便。

我們可以通過使用 File 對象的File.walk_nodes()特殊方法來組合這兩個調用。例如：

 
              >>> for array in h5file.walk_nodes("/", "Array"): ... print(array) /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'  
             

當以交互方式工作時，這是一個很好的快捷方式。

最后，我們將在/detector組中列出所有葉，即Table和Array實例（有關葉類的詳細信息，請參見The Leaf class）。請注意，在該組中只會選擇一個表類實例（即讀數）：

 
              >>> for leaf in h5file.root.detector._f_walknodes('Leaf'): ... print(leaf) /detector/readout (Table(10,)) 'Readout example'  
             

我們使用自然命名路徑規范調用對Group._f_walknodes()方法，

當然，您可以使用這些強大的方法進行更復雜的節點選擇。但首先，讓我們看看一些重要的PyTables對象實例變量。

2. 設置和獲取用戶屬性

PyTables通過使用AttributeSet類（請參見The AttributeSet class）來設置節點對象的屬性。可以通過葉節點中的標准屬性attrs和組節點中的_v_attrs訪問此對象。

假設我們需要保存在/decetor/readout中的數據的日期，以及收集過程中的溫度：

 
              >>> table = h5file.root.detector.readout >>> table.attrs.gath_date = "Wed, 06/12/2003 18:33" >>> table.attrs.temperature = 18.4 >>> table.attrs.temp_scale = "Celsius"  
             

現在，在/detector組中設置一個更復雜的屬性：

 
              >>> detector = h5file.root.detector >>> detector._v_attrs.stuff = [5, (2.3, 4.5), "Integer and tuple"]  
             

請注意因為detector是一個組節點， AttributeSet 實例是通過 _v_attrs 屬性訪問的。通常，可以將任何標准 Python 數據結構保存為屬性節點。有關如何將它們序列化以導出到磁盤的更詳細說明，請參閱 The AttributeSet class 。

檢索屬性：

 
              >>> table.attrs.gath_date 'Wed, 06/12/2003 18:33' >>> table.attrs.temperature 18.399999999999999 >>> table.attrs.temp_scale 'Celsius' >>> detector._v_attrs.stuff [5, (2.2999999999999998, 4.5), 'Integer and tuple']  
             

刪除屬性：

 
              >>> del table.attrs.gath_date  
             

查看 /detector/table 的當前用戶屬性集（如果在 rlcompleter 模塊處於活動狀態的 Unix Python 控制台上，請嘗試按 TAB 鍵兩次）：

 
              >>> table.attrs /detector/readout._v_attrs (AttributeSet), 23 attributes:  [CLASS := 'TABLE',  FIELD_0_FILL := 0,  FIELD_0_NAME := 'ADCcount',  FIELD_1_FILL := 0,  FIELD_1_NAME := 'TDCcount',  FIELD_2_FILL := 0.0,  FIELD_2_NAME := 'energy',  FIELD_3_FILL := 0,  FIELD_3_NAME := 'grid_i',  FIELD_4_FILL := 0,  FIELD_4_NAME := 'grid_j',  FIELD_5_FILL := 0,  FIELD_5_NAME := 'idnumber',  FIELD_6_FILL := '',  FIELD_6_NAME := 'name',  FIELD_7_FILL := 0.0,  FIELD_7_NAME := 'pressure',  FLAVOR := 'numpy',  NROWS := 10,  TITLE := 'Readout example',  VERSION := '2.6',  temp_scale := 'Celsius',  temperature := 18.399999999999999]  
             

我們已經獲得了所有屬性（包括系統屬性）。您可以使用 _f_list() 方法獲取所有屬性或僅用戶或系統屬性的列表：

 
              >>> print(table.attrs._f_list("all")) ['CLASS', 'FIELD_0_FILL', 'FIELD_0_NAME', 'FIELD_1_FILL', 'FIELD_1_NAME', 'FIELD_2_FILL', 'FIELD_2_NAME', 'FIELD_3_FILL', 'FIELD_3_NAME', 'FIELD_4_FILL', 'FIELD_4_NAME', 'FIELD_5_FILL', 'FIELD_5_NAME', 'FIELD_6_FILL', 'FIELD_6_NAME', 'FIELD_7_FILL', 'FIELD_7_NAME', 'FLAVOR', 'NROWS', 'TITLE', 'VERSION', 'temp_scale', 'temperature'] >>> print(table.attrs._f_list("user")) ['temp_scale', 'temperature'] >>> print(table.attrs._f_list("sys")) ['CLASS', 'FIELD_0_FILL', 'FIELD_0_NAME', 'FIELD_1_FILL', 'FIELD_1_NAME', 'FIELD_2_FILL', 'FIELD_2_NAME', 'FIELD_3_FILL', 'FIELD_3_NAME', 'FIELD_4_FILL', 'FIELD_4_NAME', 'FIELD_5_FILL', 'FIELD_5_NAME', 'FIELD_6_FILL', 'FIELD_6_NAME', 'FIELD_7_FILL', 'FIELD_7_NAME', 'FLAVOR', 'NROWS', 'TITLE', 'VERSION']  
             

重命名屬性：

 
              >>> table.attrs._f_rename("temp_scale","tempScale") >>> print(table.attrs._f_list()) ['tempScale', 'temperature']  
             

從 PyTables 2.0 開始，您還可以設置、刪除或重命名系統屬性：

 
              >>> table.attrs._f_rename("VERSION", "version") >>> table.attrs.VERSION Traceback (most recent call last): File "<stdin>", line 1, in <module> File "tables/attributeset.py", line 222, in __getattr__ (name, self._v__nodepath) AttributeError: Attribute 'VERSION' does not exist in node: '/detector/readout' >>> table.attrs.version '2.6'  
             

用戶注意：您在修改系統屬性時必須小心，因為您可能最終會欺騙 PyTables 並最終獲得不需要的行為。僅當您知道自己在做什么時才使用它。因此，鑒於上述警告，我們將繼續恢復 VERSION 屬性的原始名稱：

 
              >>> table.attrs._f_rename("version", "VERSION") >>> table.attrs.VERSION '2.6'  
             

好的，這樣更好。如果您現在要終止會話，請使用 h5ls 命令從寫入磁盤的文件中讀取 /detector/readout 屬性。

 
              $ h5ls -vr tutorial1.h5/detector/readout
Opened "tutorial1.h5" with sec2 driver.
/detector/readout        Dataset {10/Inf} Attribute: CLASS scalar Type: 6-byte null-terminated ASCII string Data: "TABLE" Attribute: VERSION scalar Type: 4-byte null-terminated ASCII string Data: "2.6" Attribute: TITLE scalar Type: 16-byte null-terminated ASCII string Data: "Readout example" Attribute: NROWS scalar Type: native long long Data: 10 Attribute: FIELD_0_NAME scalar Type: 9-byte null-terminated ASCII string Data: "ADCcount" Attribute: FIELD_1_NAME scalar Type: 9-byte null-terminated ASCII string Data: "TDCcount" Attribute: FIELD_2_NAME scalar Type: 7-byte null-terminated ASCII string Data: "energy" Attribute: FIELD_3_NAME scalar Type: 7-byte null-terminated ASCII string Data: "grid_i" Attribute: FIELD_4_NAME scalar Type: 7-byte null-terminated ASCII string Data: "grid_j" Attribute: FIELD_5_NAME scalar Type: 9-byte null-terminated ASCII string Data: "idnumber" Attribute: FIELD_6_NAME scalar Type: 5-byte null-terminated ASCII string Data: "name" Attribute: FIELD_7_NAME scalar Type: 9-byte null-terminated ASCII string Data: "pressure" Attribute: FLAVOR scalar Type: 5-byte null-terminated ASCII string Data: "numpy" Attribute: tempScale scalar Type: 7-byte null-terminated ASCII string Data: "Celsius" Attribute: temperature scalar Type: native double Data: 18.4 Location: 0:1:0:1952 Links: 1 Modified: 2006-12-11 10:35:13 CET Chunks: {85} 3995 bytes Storage: 470 logical bytes, 3995 allocated bytes, 11.76% utilization Type: struct { "ADCcount" +0 native unsigned short "TDCcount" +2 native unsigned char "energy" +3 native double "grid_i" +11 native int "grid_j" +15 native int "idnumber" +19 native long long "name" +27 16-byte null-terminated ASCII string "pressure" +43 native float } 47 bytes  
             

屬性是一種向數據添加持久（元）信息的有用機制。

3. 得到對象的元數據

PyTables 中的每個對象都有關於文件中數據的元數據信息。通常可以通過節點實例變量訪問此元信息。讓我們看一些例子：

 
              >>> print("Object:", table) Object: /detector/readout (Table(10,)) 'Readout example' >>> print("Table name:", table.name) Table name: readout >>> print("Table title:", table.title) Table title: Readout example >>> print("Number of rows in table:", table.nrows) Number of rows in table: 10 >>> print("Table variable names with their type and shape:") Table variable names with their type and shape: >>> for name in table.colnames: ... print(name, ':= %s, %s' % (table.coldtypes[name], table.coldtypes[name].shape)) ADCcount := uint16, () TDCcount := uint8, () energy := float64, () grid_i := int32, () grid_j := int32, () idnumber := int64, () name := |S16, () pressure := float32, ()  
             

在這里，Table 對象的 name、title、nrows、colnames 和 Coldtypes 屬性（請參閱 Table 以獲取完整的屬性列表）為我們提供了有關表數據的大量信息。

您可以通過尋求幫助以交互方式檢索有關 PyTables 中公共對象的一般信息：

 
              >>> help(table) Help on Table in module tables.table: class Table(tableextension.Table, tables.leaf.Leaf) | This class represents heterogeneous datasets in an HDF5 file. | | Tables are leaves (see the `Leaf` class) whose data consists of a | unidimensional sequence of *rows*, where each row contains one or | more *fields*. Fields have an associated unique *name* and | *position*, with the first field having position 0. All rows have | the same fields, which are arranged in *columns*. [snip] | | Instance variables | ------------------ | | The following instance variables are provided in addition to those | in `Leaf`. Please note that there are several `col` dictionaries | to ease retrieving information about a column directly by its path | name, avoiding the need to walk through `Table.description` or | `Table.cols`. | | autoindex | Automatically keep column indexes up to date? | | Setting this value states whether existing indexes should be | automatically updated after an append operation or recomputed | after an index-invalidating operation (i.e. removal and | modification of rows). The default is true. [snip] | rowsize | The size in bytes of each row in the table. | | Public methods -- reading | ------------------------- | | * col(name) | * iterrows([start][, stop][, step]) | * itersequence(sequence) * itersorted(sortby[, checkCSI][, start][, stop][, step]) | * read([start][, stop][, step][, field][, coords]) | * read_coordinates(coords[, field]) * read_sorted(sortby[, checkCSI][, field,][, start][, stop][, step]) | * __getitem__(key) | * __iter__() | | Public methods -- writing | ------------------------- | | * append(rows) | * modify_column([start][, stop][, step][, column][, colname]) [snip]  
             

嘗試獲取其他對象文檔的幫助：

 
              >>> help(h5file) >>> help(table.remove_rows)  
             

檢查 /columns/pressure Array 對象中的元數據：

 
              >>> pressureObject = h5file.get_node("/columns", "pressure") >>> print("Info on the object:", repr(pressureObject)) Info on the object: /columns/pressure (Array(3,)) 'Pressure column selection'  atom := Float64Atom(shape=(), dflt=0.0)  maindim := 0  flavor := 'numpy'  byteorder := 'little'  chunkshape := None >>> print(" shape: ==>", pressureObject.shape)  shape: ==> (3,) >>> print(" title: ==>", pressureObject.title)  title: ==> Pressure column selection >>> print(" atom: ==>", pressureObject.atom)  atom: ==> Float64Atom(shape=(), dflt=0.0)  
             

請注意，。 File.get_node()的優點是它可以從路徑名字符串中獲取節點（如本例所示），還可以充當過濾器以僅顯示特定位置中作為類名稱實例的節點。但是，總的來說，我認為自然命名更優雅且更易於使用，尤其是當您使用交互式控制台中存在的名稱自動補全功能時。

如果查看 pressureObject 對象的 type 屬性，可以驗證它是一個“float64”數組。通過查看它的 shape 屬性，你可以推斷出磁盤上的數組是一維的，有 3 個元素。有關完整的 Array 屬性列表，請參閱 Array 或內部文檔字符串。

4. 從數組對象中讀取數據

找到所需的 Array 后，使用 Array 對象的read()方法檢索其數據：

 
              >>> pressureArray = pressureObject.read() >>> pressureArray array([ 25., 36., 49.]) >>> print("pressureArray is an object of type:", type(pressureArray)) pressureArray is an object of type: <type 'numpy.ndarray'> >>> nameArray = h5file.root.columns.name.read() >>> print("nameArray is an object of type:", type(nameArray)) nameArray is an object of type: <type 'list'> >>> >>> print("Data on arrays nameArray and pressureArray:") Data on arrays nameArray and pressureArray: >>> for i in range(pressureObject.shape[0]): ... print(nameArray[i], "-->", pressureArray[i]) Particle: 5 --> 25.0 Particle: 6 --> 36.0 Particle: 7 --> 49.0  
             

可以發現Array.read()方法為 pressureObject 實例返回一個真實的 NumPy 對象，read() 方法為nameArray 對象實例返回的是Python（字符串）列表。

此外，有一個名為FAVOR的HDF5屬性記錄了保存對象的數據類型，可以通過Array.attrs.FLAVOR 變量訪問這個屬性。保存的對象類型存儲，就像一個HDF5保存的對象類型存儲為磁盤上對象的 HDF5 屬性（名為 FLAVOR）。然后將此屬性作為 Array 元信息讀取（可通過），從而使讀取的數組能夠轉換為原始對象。這提供了一種將大量對象保存為數組的方法，並保證您以后能夠以原始形式恢復它們。有關 Array 對象類支持的對象的完整列表，請參閱File.create_array()。

將數據提交到表和數組

現在讓我們研究 PyTables 最強大的功能之一，即如何修改已創建的表和數組¹

1. 將數據追加到現有表

現在，讓我們看看如何向磁盤上的現有表添加記錄。讓我們為readout表對象附加一些新值：

 
              >>> table = h5file.root.detector.readout >>> particle = table.row >>> for i in range(10, 15): ... particle['name'] = f'Particle: {i:6d}' ... particle['TDCcount'] = i % 256 ... particle['ADCcount'] = (i * 256) % (1 << 16) ... particle['grid_i'] = i ... particle['grid_j'] = 10 - i ... particle['pressure'] = float(i*i) ... particle['energy'] = float(particle['pressure'] ** 4) ... particle['idnumber'] = i * (2 ** 34) ... particle.append() >>> table.flush()  
             

這與我們用來填充新表的方法相同。 PyTables 知道該表在磁盤上，當您添加新記錄時，它們會追加到表的末尾²。

如果您仔細查看代碼，您將看到我們使用 table.row 屬性創建了一個table行並用新值填充它。每次調用其 append() 方法時，實際行都會提交到輸出緩沖區，並且行指針會遞增以指向下一個表記錄。當緩沖區已滿時，將數據保存在磁盤上，並在下一個循環中再次重復使用緩沖區。
注意事項：不要忘記在寫操作后總是調用flush()方法，否則你的表不會被更新！

讓我們看看修改后的表中的一些行，並驗證我們的新數據是否已附加：

 
              >>> for r in table.iterrows(): ... print("%-16s | %11.1f | %11.4g | %6d | %6d | %8d \|" % \\ ... (r['name'], r['pressure'], r['energy'], r['grid_i'], r['grid_j'], ... r['TDCcount'])) Particle: 0 | 0.0 | 0 | 0 | 10 | 0 | Particle: 1 | 1.0 | 1 | 1 | 9 | 1 | Particle: 2 | 4.0 | 256 | 2 | 8 | 2 | Particle: 3 | 9.0 | 6561 | 3 | 7 | 3 | Particle: 4 | 16.0 | 6.554e+04 | 4 | 6 | 4 | Particle: 5 | 25.0 | 3.906e+05 | 5 | 5 | 5 | Particle: 6 | 36.0 | 1.68e+06 | 6 | 4 | 6 | Particle: 7 | 49.0 | 5.765e+06 | 7 | 3 | 7 | Particle: 8 | 64.0 | 1.678e+07 | 8 | 2 | 8 | Particle: 9 | 81.0 | 4.305e+07 | 9 | 1 | 9 | Particle: 10 | 100.0 | 1e+08 | 10 | 0 | 10 | Particle: 11 | 121.0 | 2.144e+08 | 11 | -1 | 11 | Particle: 12 | 144.0 | 4.3e+08 | 12 | -2 | 12 | Particle: 13 | 169.0 | 8.157e+08 | 13 | -3 | 13 | Particle: 14 | 196.0 | 1.476e+09 | 14 | -4 | 14 |  
             

2. 修改表中的數據

好的，到目前為止，我們只是在我們的表中讀取和寫入（追加）值。但有時我們需要修改數據，讓我們看看如何修改保存在現有表中的值。我們將開始修改Particle表第一行中的單個單元格：

 
              >>> print("Before modif-->", table[0]) Before modif--> (0, 0, 0.0, 0, 10, 0L, 'Particle: 0', 0.0) >>> table.cols.TDCcount[0] = 1 >>> print("After modifying first row of ADCcount-->", table[0]) After modifying first row of ADCcount--> (0, 1, 0.0, 0, 10, 0L, 'Particle: 0', 0.0) >>> table.cols.energy[0] = 2 >>> print("After modifying first row of energy-->", table[0]) After modifying first row of energy--> (0, 1, 2.0, 0, 10, 0L, 'Particle: 0', 0.0)  
             

我們也可以修改完整的列范圍：

 
              >>> table.cols.TDCcount[2:5] = [2,3,4] >>> print("After modifying slice [2:5] of TDCcount-->", table[0:5]) After modifying slice [2:5] of TDCcount--> [(0, 1, 2.0, 0, 10, 0L, 'Particle: 0', 0.0)  (256, 1, 1.0, 1, 9, 17179869184L, 'Particle: 1', 1.0)  (512, 2, 256.0, 2, 8, 34359738368L, 'Particle: 2', 4.0)  (768, 3, 6561.0, 3, 7, 51539607552L, 'Particle: 3', 9.0)  (1024, 4, 65536.0, 4, 6, 68719476736L, 'Particle: 4', 16.0)] >>> table.cols.energy[1:9:3] = [2,3,4] >>> print("After modifying slice [1:9:3] of energy-->", table[0:9]) After modifying slice [1:9:3] of energy--> [(0, 1, 2.0, 0, 10, 0L, 'Particle: 0', 0.0)  (256, 1, 2.0, 1, 9, 17179869184L, 'Particle: 1', 1.0)  (512, 2, 256.0, 2, 8, 34359738368L, 'Particle: 2', 4.0)  (768, 3, 6561.0, 3, 7, 51539607552L, 'Particle: 3', 9.0)  (1024, 4, 3.0, 4, 6, 68719476736L, 'Particle: 4', 16.0)  (1280, 5, 390625.0, 5, 5, 85899345920L, 'Particle: 5', 25.0)  (1536, 6, 1679616.0, 6, 4, 103079215104L, 'Particle: 6', 36.0)  (1792, 7, 4.0, 7, 3, 120259084288L, 'Particle: 7', 49.0)  (2048, 8, 16777216.0, 8, 2, 137438953472L, 'Particle: 8', 64.0)]  
             

檢查值是否已正確修改！

提示

記住 TDCcount 是第二列，能量是第三列。在 Column.__setitem__()中查找有關修改列的更多信息。

PyTables 還允許您同時修改完整的行集。作為這些功能的演示，請參見下一個示例：

 
              >>> table.modify_rows(start=1, step=3, ... rows=[(1, 2, 3.0, 4, 5, 6L, 'Particle: None', 8.0), ... (2, 4, 6.0, 8, 10, 12L, 'Particle: None*2', 16.0)]) 2 >>> print("After modifying the complete third row-->", table[0:5]) After modifying the complete third row--> [(0, 1, 2.0, 0, 10, 0L, 'Particle: 0', 0.0)  (1, 2, 3.0, 4, 5, 6L, 'Particle: None', 8.0)  (512, 2, 256.0, 2, 8, 34359738368L, 'Particle: 2', 4.0)  (768, 3, 6561.0, 3, 7, 51539607552L, 'Particle: 3', 9.0)  (2, 4, 6.0, 8, 10, 12L, 'Particle: None*2', 16.0)]  
             

如您所見， modify_rows() 調用修改了第二行和第五行，並返回了修改行數。

除了 Table.modify_rows()之外，還有另一種方法，稱為Table.modify_column()來修改特定的列。

最后，還有另一種修改表格的方法，通常比上述方法更方便。這種新方法使用附加到每個表的 Row 實例的 Row.update() 方法，因此它旨在用於表迭代器。看看下面的例子：

 
              >>> for row in table.where('TDCcount <= 2'): ... row['energy'] = row['TDCcount']*2 ... row.update() >>> print("After modifying energy column (where TDCcount <=2)-->", table[0:4]) After modifying energy column (where TDCcount <=2)--> [(0, 1, 2.0, 0, 10, 0L, 'Particle: 0', 0.0)  (1, 2, 4.0, 4, 5, 6L, 'Particle: None', 8.0)  (512, 2, 4.0, 2, 8, 34359738368L, 'Particle: 2', 4.0)  (768, 3, 6561.0, 3, 7, 51539607552L, 'Particle: 3', 9.0)]  
             

注意：作者發現這種更新表格的方式（即使用 Row.update()）既方便又高效。請確保廣泛使用它。

注意事項：目前，如果循環被 break 語句破壞，Row.update()將不起作用（表不會更新）。一種可能的解決方法是通過在 break 語句之前調用 row._flushModRows() 手動刷新行內部緩沖區。

3. 修改數組中的數據

如何修改數組對象中的數據。執行此操作的基本方法是使用Array.__setitem__()特殊方法。讓我們看看如何修改 pressureObject 數組上的數據：

 
              >>> pressureObject = h5file.root.columns.pressure >>> print("Before modif-->", pressureObject[:]) Before modif--> [ 25. 36. 49.] >>> pressureObject[0] = 2 >>> print("First modif-->", pressureObject[:]) First modif--> [ 2. 36. 49.] >>> pressureObject[1:3] = [2.1, 3.5] >>> print("Second modif-->", pressureObject[:]) Second modif--> [ 2. 2.1 3.5] >>> pressureObject[::2] = [1,2] >>> print("Third modif-->", pressureObject[:]) Third modif--> [ 1. 2.1 2. ]  
             

一般而言，您可以使用（多維）擴展切片的任意組合。

唯一的例外是您不能使用負值作為 step 來引用要修改的索引。有關如何在 PyTables 對象中使用擴展切片的更多示例，請參閱Array.__getitem__()。
同樣，對於字符串數組：

 
              >>> nameObject = h5file.root.columns.name >>> print("Before modif-->", nameObject[:]) Before modif--> ['Particle: 5', 'Particle: 6', 'Particle: 7'] >>> nameObject[0] = 'Particle: None' >>> print("First modif-->", nameObject[:]) First modif--> ['Particle: None', 'Particle: 6', 'Particle: 7'] >>> nameObject[1:3] = ['Particle: 0', 'Particle: 1'] >>> print("Second modif-->", nameObject[:]) Second modif--> ['Particle: None', 'Particle: 0', 'Particle: 1'] >>> nameObject[::2] = ['Particle: -3', 'Particle: -5'] >>> print("Third modif-->", nameObject[:]) Third modif--> ['Particle: -3', 'Particle: 0', 'Particle: -5']  
             

4. 如何刪除表中的行

從表中刪除一些行。假設我們要刪除第 5 到第 9 行（包括）：

 
              >>> table.remove_rows(5,10) 5  
             

Table.remove_rows() 刪除范圍（開始、停止）中的行。它返回有效刪除的行數。

完成后不要忘記關閉文件：

 
              >>> h5file.close() >>> ^D $  
             

在Figure 2中，您可以看到包含我們剛剛創建的數據集的 PyTables 文件的圖形視圖。在Figure 3， /detector/readout 表的一般屬性。顯示表 /detector/readout的一般屬性。

Figure 2. 教程 1 的數據文件的最終版本。

Figure 3. 圖 3. /detector/readout 表的一般屬性。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 對象和數組-把數據轉換成對象 plsql導入導出表結構和數據對象 php表單怎么提交到數據庫？ Jquery Ajax 復雜json對象提交到WebService oracle備份表和數據前端開發 - bootstrapTable編輯列數據並提交到后台通過button將form表單的數據提交到action層 jersey post提交到 ContainerRequestFilter 而HttpServletRequest獲取不到數據(轉) jsp頁面提交到數據庫中文亂碼精彩的javascript對象和數組混合相加