使用bitset實現毫秒級查詢


前言

因為業務要求api的一次請求響應時間在10ms以內,所以傳統的數據庫查詢操作直接被排除(網絡io和磁盤io)。通過調研,最終使用了bitset,目前已經正常運行了很久


bitset介紹

看JDK中的解釋簡直一頭霧水,用我自己的理解概括一下

  1. bitset的內部實現是long數組
  2. set中每一個位的默認值為false(0)
  3. bitset長度按需增長
  4. bitset非線程安全

bitset關鍵方法分析
    /**
     * Sets the bit at the specified index to {@code true}.
     *
     * @param  bitIndex a bit index
     * @throws IndexOutOfBoundsException if the specified index is negative
     * @since  JDK1.0
     */
    public void set(int bitIndex) {
        if (bitIndex < 0)
            throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

        int wordIndex = wordIndex(bitIndex);
        expandTo(wordIndex);

        words[wordIndex] |= (1L << bitIndex); // Restores invariants

        checkInvariants();
    }

設置指定“位”為true,bitIndex參數為非負整數。假設我們執行以下代碼,觀察上面代碼中worIndex,words[wordIndex]值的變化

BitSet bs = new BitSet()
bs.set(0);
bs.set(1);
bs.set(2);
bs.set(3);
bs.set(4);
bitIndex wordIndex words[wordIndex] words[wordIndex]二進制表示
0 0 1 0001
1 0 3 0011
2 0 7 0111
3 0 15 1111
4 0 31 0001 1111

通過上表,我們可以很清晰的根據bitIndex和words[wordIndex]二進制值的對應關系,得到一個結論,即:bitset中每一個long可以表示64個非負整數在bitSet中存在與否。例如:0001可以表示整數0在bitset中存在,1111可以表示整數3,2,1,0在bitset中存在。


進入正題,實現bitset毫秒級查詢

想象一個場景,我們有一張user表,name唯一。

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `address` varchar(255) NOT NULL comment '地址',
  `gender` varchar(10) NOT NULL comment '性別',
  `age` varchar(10) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;

假設我們要查詢“北京市18歲的女生”,那么對應的sql如下:

select * from `user` where address='北京' and age='18' and gender='girl';

如何使用bitset實現同樣的查詢呢?

  1. 將user表數據加載進內存中
  2. 為user表建立address,age,gender維度的bitset索引
  3. 根據索引查詢數據
1.將user表數據加載進內存中

將user表從數據庫讀取出來存入List

User實體類:

public class User implements Cloneable {
	private String name;
	private String address;
	private String gender;
	private String age;

	@Override
	public String toString() {
		return "User [name=" + name + ", address=" + address + ", gender=" + gender + ", age=" + age + "]";
	}

	@Override
	public User clone() {
		User user = null;
		try {
			user = (User) super.clone();
		} catch (CloneNotSupportedException e) {
			e.printStackTrace();
		}
		return user;
	}
    //省略get set 方法。。。


2.建立索引

創建bitset索引模型類

public class BitSetIndexModel {
	private String type;//索引類型:address,age,gender
	private ConcurrentHashMap<String, Integer> map;//索引類型和bitSet在bsList中下標的映射關系
	private List<String> list;//索引類型的值集合,例如gender:girl,boy
	private List<BitSet> bsList;//bitset集合

	public BitSetIndex() {
		
	}

	public BitSetIndexModel(String type) {
		this.type = type;
		map = new ConcurrentHashMap<String, Integer>();
		list = new ArrayList<String>();
		bsList = new ArrayList<BitSet>();
	}
	
	public String getType() {
		return type;
	}

	public void setType(String type) {
		this.type = type;
	}

	public Map<String, Integer> getMap() {
		return map;
	}

	public void setMap(ConcurrentHashMap<String, Integer> map) {
		this.map = map;
	}

	public List<String> getList() {
		return list;
	}

	public void setList(List<String> list) {
		this.list = list;
	}

	public List<BitSet> getBsList() {
		return bsList;
	}

	public void setBsList(List<BitSet> bsList) {
		this.bsList = bsList;
	}

	/**
	 *
	 * @param str
	 * @param i
	 */
	public void createIndex(String str, int i) {
		BitSet bitset = null;
		//獲取‘str’對應的bitset在bsList中的下標
		Integer index = this.getMap().get(str);
		if (index != null) {
			bitset = this.getBsList().get(index);
			if (bitset == null) {
				bitset = new BitSet();
				this.getBsList().add(index, bitset);
			}
			bitset.set(i, true);// 將str對應的位置為true,true可省略
		} else {
			bitset = new BitSet();
			List<String> list = this.getList();
			list.add(str);
			index = list.size() - 1;
			bitset.set(i, true);
			this.getBsList().add(bitset);
			this.getMap().put(str, index);
		}
	}
	
	/**
	 * 從entity里拿出符合條件的bitset
	 *
	 * @param str
	 * @return
	 */
	public BitSet get(String str) {
		BitSet bitset = null;
		str = str.toLowerCase();
		Integer index = this.getMap().get(str);

		if (index != null) {
			bitset = this.getBsList().get(index);
		} else {
			bitset = new BitSet();
		}
		return bitset;
	}

	/**
	 * bitset的與運算
	 *
	 * @param str
	 * @param bitset
	 * @return
	 */
	public BitSet and(String str, BitSet bitset) {
		if (str != null) {
			str = str.toLowerCase();
			if (bitset != null) {
				bitset.and(get(str));
			} else {
				bitset = new BitSet();
				bitset.or(get(str));
			}
		}
		return bitset;
	}

	/**
	 * bitset的或運算
	 *
	 * @param str
	 * @param bitset
	 * @return
	 */
	public BitSet or(String str, BitSet bitset) {
		if (str != null) {
			str = str.toLowerCase();
			if (bitset != null) {
				bitset.or(get(str));
			} else {
				bitset = new BitSet();
				bitset.or(get(str));
			}
		}
		return bitset;
	}

	/**
	 * 獲取bitset值為true的 即 把 bitset翻譯為list的索引
	 *
	 * @param bitset
	 * @return
	 */
	public static List<Integer> getRealIndexs(BitSet bitset) {
		List<Integer> indexs = new ArrayList<Integer>();
		if (bitset != null) {
			int i = bitset.nextSetBit(0);
			if (i != -1) {
				indexs.add(i);
				for (i = bitset.nextSetBit(i + 1); i >= 0; i = bitset.nextSetBit(i + 1)) {
					int endOfRun = bitset.nextClearBit(i);
					do {
						indexs.add(i);
					} while (++i < endOfRun);
				}
			}
		}

		return indexs;
	}
	
}


為每一個user對象創建address,gender,age維度索引

public class UserIndexStore {

	private static final String ADDRESS = "address";
	private static final String GENDER = "gender";
	private static final String AGE = "age";
	private BitSetIndexModel address;
	private BitSetIndexModel gender;
	private BitSetIndexModel age;
	private ConcurrentHashMap<Integer, User> userMap;//存儲所有的user數據
        private ConcurrentHashMap<String, Integer> nameIndexMap;//name和index映射

	public static final UserIndexStore INSTANCE = getInstance();

	private UserIndexStore() {
		init();
	}

	public static UserIndexStore getInstance() {
		return UserIndexStoreHolder.instance;
	}

	private static class UserIndexStoreHolder {
		private static UserIndexStore instance = new UserIndexStore();
	}

	private void init() {
		this.address = new BitSetIndexModel(ADDRESS);
		this.gender = new BitSetIndexModel(GENDER);
		this.age = new BitSetIndexModel(AGE);
		userMap = new ConcurrentHashMap<Integer, User>();
                nameIndexMap =  new ConcurrentHashMap<String, Integer>();
	}

	/**
	 * 構建索引
	 * @param users 
	 */
	public void createIndex(List<User> users) {
		if (users != null && users.size() > 0) {
			for (int index = 0; index < users.size(); index++) {
				User user = users.get(index);
				createIndex(user, index);
			}
		}
	}

	private void createIndex(User user, int index) {
		getAddress().update(user.getAddress(), index);
		getGender().update(user.getGender(), index);
		getAge().update(user.getAge(), index);
		this.userMap.put(index, user);
		this.nameIndexMap.put(user.getName(), index);
	}

	public BitSet query(String address, String gender, String age) {
		BitSet bitset = null;
		bitset = getAddress().and(address, bitset);
		bitset = getGender().and(gender, bitset);
		bitset = getAge().and(age, bitset);
		return bitset;
	}

	public User findUser(Integer index) {
		User user = this.userMap.get(index);
		if (user != null) {
			return user.clone();//可能會對user做修改操作,要保證內存原始數據不變
		}
		return null;
	}

	public BitSetIndexModel getAddress() {
		return address;
	}

	public void setAddress(BitSetIndexModel address) {
		this.address = address;
	}

	public BitSetIndexModel getGender() {
		return gender;
	}

	public void setGender(BitSetIndexModel gender) {
		this.gender = gender;
	}

	public BitSetIndexModel getAge() {
		return age;
	}

	public void setAge(BitSetIndexModel age) {
		this.age = age;
	}

}

3.測試bitset
public class BitSetTest {

	public static void main(String[] args) {
		List<User> users = buildData();
		UserIndexStore.getInstance().createIndex(users);
		ExecutorService executorService = Executors.newFixedThreadPool(20);
		int num = 2000;
		long begin1 = System.currentTimeMillis();
		for (int i = 0; i < num; i++) {
			Runnable syncRunnable = new Runnable() {
				@Override
				public void run() {
					List<Integer> indexs = BitSetIndexModel.getRealIndexs(UserIndexStore.getInstance().query("北京", "girl", "18"));
					for (Integer index : indexs) {
						UserIndexStore.getInstance().findUser(index);
					}
				}
			};
			executorService.execute(syncRunnable);
		}
		executorService.shutdown();
		while (true) {
			try {
				if (executorService.awaitTermination(1, TimeUnit.SECONDS)) {
					System.out.println("單次查詢時間為:" + (System.currentTimeMillis() - begin1) / num + "ms");
					break;
				}
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
		}
	}

	private static List<User> buildData() {
		String[] addresss = { "北京", "上海", "深圳" };
		String[] ages = { "16", "17", "18" };
		List<User> users = new ArrayList<>();
		for (int i = 0; i < 200000; i++) {
			User user = new User();
			user.setName("name" + i);
			int rand = ThreadLocalRandom.current().nextInt(3);
			user.setAddress(addresss[ThreadLocalRandom.current().nextInt(3)]);
			user.setGender((rand & 1) == 0 ? "girl" : "boy");
			user.setAge(ages[ThreadLocalRandom.current().nextInt(3)]);
			users.add(user);
		}
		return users;
	}

}

測試結果(查詢2w次):

數據量(users.size()) 並發數 平均查詢時間
20w 10 1ms
50w 20 3ms
100w 50 9ms

測試機為thinkpad x240 i5 8g內存

4.總結

優點
通過測試發現隨着數據量的增大和並發數的提高,平均耗時並沒有明顯升高,並且響應時間都在10ms以內

缺點

  1. 不適合數據量太大的情況,因為需要把數據全部加載進內存
  2. 不適合復雜查詢
  3. 不適合對name,id等唯一值做查詢
后記

因為我們的查詢業務比較簡單,唯一的要求是速度,並且數據量也不大,每張表的數據量都不超過100w,所以使用這種方式比較合適。
在本篇文章中只談到了如何創建索引,以及最基本的查詢,在下一篇中我會繼續說明如何更新索引,以及一些復雜查詢,比如<,>,between and。

轉載請注明出處


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM