數據庫的最簡單實現與數據結構

所有應用軟件之中，數據庫可能是最復雜的。

MySQL的手冊有3000多頁，PostgreSQL的手冊有2000多頁，Oracle的手冊更是比它們相加還要厚。

但是，自己寫一個最簡單的數據庫，做起來並不難。Reddit上面有一個帖子，只用了幾百個字，就把原理講清楚了。下面是我根據這個帖子整理的內容。

一、數據以文本形式保存

第一步，就是將所要保存的數據，寫入文本文件。這個文本文件就是你的數據庫。

為了方便讀取，數據必須分成記錄，每一條記錄的長度規定為等長。比如，假定每條記錄的長度是800字節，那么第5條記錄的開始位置就在3200字節。

大多數時候，我們不知道某一條記錄在第幾個位置，只知道主鍵（primary key）的值。這時為了讀取數據，可以一條條比對記錄。但是這樣做效率太低，實際應用中，數據庫往往采用B樹（B-tree）格式儲存數據。

二、什么是B樹？

要理解B樹，必須從二叉查找樹（Binary search tree）講起。

二叉查找樹

二叉查找樹是一種查找效率非常高的數據結構，它有三個特點。

（1）每個節點最多只有兩個子樹。

（2）左子樹都為小於父節點的值，右子樹都為大於父節點的值。

（3）在n個節點中找到目標值，一般只需要log(n)次比較。

二叉查找樹的結構不適合數據庫，因為它的查找效率與層數相關。越處在下層的數據，就需要越多次比較。極端情況下，n個數據需要n次比較才能找到目標值。對於數據庫來說，每進入一層，就要從硬盤讀取一次數據，這非常致命，因為硬盤的讀取時間遠遠大於數據處理時間，數據庫讀取硬盤的次數越少越好。

B樹是對二叉查找樹的改進。它的設計思想是，將相關數據盡量集中在一起，以便一次讀取多個數據，減少硬盤操作次數。

B-tree

B樹的特點也有三個。

（1）一個節點可以容納多個值。比如上圖中，最多的一個節點容納了4個值。

（2）除非數據已經填滿，否則不會增加新的層。也就是說，B樹追求"層"越少越好。

（3）子節點中的值，與父節點中的值，有嚴格的大小對應關系。一般來說，如果父節點有a個值，那么就有a+1個子節點。比如上圖中，父節點有兩個值（7和16），就對應三個子節點，第一個子節點都是小於7的值，最后一個子節點都是大於16的值，中間的子節點就是7和16之間的值。

這種數據結構，非常有利於減少讀取硬盤的次數。假定一個節點可以容納100個值，那么3層的B樹可以容納100萬個數據，如果換成二叉查找樹，則需要20層！假定操作系統一次讀取一個節點，並且根節點保留在內存中，那么B樹在100萬個數據中查找目標值，只需要讀取兩次硬盤。

三、索引

數據庫以B樹格式儲存，只解決了按照"主鍵"查找數據的問題。如果想查找其他字段，就需要建立索引（index）。

所謂索引，就是以某個字段為關鍵字的B樹文件。假定有一張"雇員表"，包含了員工號（主鍵）和姓名兩個字段。可以對姓名建立索引文件，該文件以B樹格式對姓名進行儲存，每個姓名后面是其在數據庫中的位置（即第幾條記錄）。查找姓名的時候，先從索引中找到對應第幾條記錄，然后再從表格中讀取。

這種索引查找方法，叫做"索引順序存取方法"（Indexed Sequential Access Method），縮寫為ISAM。它已經有多種實現（比如C-ISAM庫和D-ISAM庫），只要使用這些代碼庫，就能自己寫一個最簡單的數據庫。

四、高級功能

部署了最基本的數據存取（包括索引）以后，還可以實現一些高級功能。

（1）SQL語言是數據庫通用操作語言，所以需要一個SQL解析器，將SQL命令解析為對應的ISAM操作。

（2）數據庫連接（join）是指數據庫的兩張表通過"外鍵"，建立連接關系。你需要對這種操作進行優化。

（3）數據庫事務（transaction）是指批量進行一系列數據庫操作，只要有一步不成功，整個操作都不成功。所以需要有一個"操作日志"，以便失敗時對操作進行回滾。

（4）備份機制：保存數據庫的副本。

（5）遠程操作：使得用戶可以在不同的機器上，通過TCP/IP協議操作數據庫。

（完）

--------------------------------------------------------------------------------------------------------------------------------------------------------------------

How do you build a database? self.Database

Its a great question, and deserves a long answer.

Most database servers are built in C, and store data using B-tree type constructs. In the old days there was a product called C-Isam (c library for an indexed sequential access method) which is a low level library to help C programmers write data in B-tree format. So you need to know about btrees and understand what these are.

Most databases store data separate to indexes. Lets assume a record (or row) is 800 bytes long and you write 5 rows of data to a file. If the row contains columns such as first name, last name, address etc. and you want to search for a specific record by last name, you can open the file and sequentially search through each record but this is very slow. Instead you open an index file which just contains the lastname and the position of the record in the data file. Then when you have the position you open the data file, lseek to that position and read the data. Because index data is very small it is much quicker to search through index files. Also as the index files are stored in btrees in it very quick to effectively do a quicksearch (divide and conquer) to find the record you are looking for.

So you understand for one "table" you will have a data file with the data and one (or many) index files. The first index file could be for lastname, the next could be to search by SS number etc. When the user defines their query to get some data, they decide which index file to search through. If you can find any info on C-ISAM (there used to be an open source version (or cheap commercial) called D-ISAM) you will understand this concept quite well.

Once you have stored data and have index files, using an ISAM type approach allows you to GET a record based on a value, or PUT a new record. However modern database servers all support SQL, so you need an SQL parser that translates the SQL statement into a sequence of related GETs. SQL may join 2 tables so an optimizer is also needed to decide which table to read first (normally based on number of rows in each table and indexes available) and how to relate it to the next table. SQL can INSERT data so you need to parse that into PUT statements but it can also combine multiple INSERTS into transactions so you need a transaction manager to control this, and you will need transaction logs to store wip/completed transactions.

It is possible you will need some backup/restore commands to backup your data files and index files and maybe also your transaction log files, and if you really want to go for it you could write some replication tools to read your transaction log and replicate the transactions to a backup database on a different server. Note if you want your client programs (for example an SQL UI like phpmyadmin) to reside on separate machine than your database server you will need to write a connection manager that sends the SQL requests over TCP/IP to your server, then authenticate it using some credentials, parse the request, run your GETS and send back the data to the client.

So these database servers can be a lot of work, especially for one person. But you can create simple versions of these tools one at a time. Start with how to store data and indexes, and how to retrieve data using an ISAM type interface.

There are books out there - look for older books on mysql and msql, look for anything on google re btrees and isam, look for open source C libraries that already do isam. Get a good understanding on file IO on a linux machine using C. Many commercial databases now dont even use the filesystem for their data files because of cacheing issues - they write directly to raw disk. You want to just write to files initially.

I hope this helps a little bit.