背景:構造一個無重復的白名單,之后要在里面進行二分查找。故要求名單有序,且無重復,並且要進行二分查找,所以要采用有:隨機訪問迭代器類型的容器。這類容器有vector,array,deque。顯然要vector和deque合適一點,但是deque並沒有體現出其兩端和中間插入時間為固定而非線性的優勢,因為本例都在尾部插入,vector和deque同為固定時間。而deque的隨機存儲操作時間長,故采用vector。
一.利用STL算法unique
首先要將vector排序,排序后。利用erase配合unique算法。利用一個含有一百萬整數,里面重復數字並不太多的情況測試。
- #include<fstream>
- #include<iostream>
- #include <vector>
- #include<algorithm>
- #include<ctime>
- using namespace std;
- void main()
- {
- ifstream fwhite;
- int number;
- vector<int> white_list;
- clock_t cost;
- fwhite.open("largeW.txt");
- if(!fwhite.is_open())
- {//or use .good .fail or directly use ! to judge if the file has been opened successfully
- cout<<"can't open file list"<<endl;
- exit(EXIT_FAILURE);
- }
- cost=clock();
- while(!fwhite.eof())
- {
- fwhite>>number;
- white_list.push_back(number);
- }
- cost=clock()-cost;
- cout<<"Time to load data : "<<cost<<endl;
- sort(white_list.begin(),white_list.end());
- white_list.erase(unique(white_list.begin(),white_list.end()),white_list.end());
- cost = clock()-cost;
- cout<<"Time to remove reduplicative data from vector : "<<cost<<endl;
- ofstream fout("sort_white.txt",ios::trunc);
- vector<int>::iterator iter=white_list.begin();
- while (iter!= white_list.end())
- {
- fout<<*iter<<endl;
- iter++;
- }
- cost = clock()-cost;
- cout<<"Time to write data into file : "<<cost<<endl;
- exit(EXIT_SUCCESS);
- };
二.利用set配合copy
讀數據的時候就用set,然后直接拷貝到vector。但是拷貝的時候要用到insert_iterator來進行插入拷貝。(溢出問題)
- #include<fstream>
- #include<iostream>
- #include <vector>
- #include<set>
- #include<algorithm>
- #include<ctime>
- #include <iterator>
- using namespace std;
- void main()
- {
- ifstream fwhite;
- int number;
- vector<int> white_list;
- set<int> ori_list;
- clock_t cost;
- fwhite.open("largeW.txt");
- if(!fwhite.is_open())
- {//or use .good .fail or directly use ! to judge if the file has been opened successfully
- cout<<"can't open file list"<<endl;
- exit(EXIT_FAILURE);
- }
- cost=clock();
- while(!fwhite.eof())
- {
- fwhite>>number;
- ori_list.insert(number);
- }
- cost=clock()-cost;
- cout<<"Time to load data : "<<cost<<endl;
- insert_iterator<vector<int> > it(white_list,white_list.begin());
- copy(ori_list.begin(),ori_list.end(),it);
- cost = clock()-cost;
- cout<<"Time to copy data from set to vector : "<<cost<<endl;
- ofstream fout("sort_white.txt",ios::trunc);
- vector<int>::iterator iter=white_list.begin();
- while (iter!= white_list.end())
- {
- fout<<*iter<<endl;
- iter++;
- }
- cost = clock()-cost;
- cout<<"Time to write data into file : "<<cost<<endl;
- exit(EXIT_SUCCESS);
- };
三.時間開銷從開始構造容器開始,利用clock計時
第一種耗時:8.477秒
第二種耗時:23.246秒
看出,還是直接用vector就好,然后配合unique好。原因:同樣插入100萬個整數,set用時過長,經測試用去了約18秒。為主要開銷。
第一種:讀取文件到vector開銷5.852秒,排序並去除重復元素開銷3.205秒,寫文件開銷15.624秒。總耗時約24秒左右。
第二種:讀文件到set開銷18.893秒,從set拷貝數據到vector開銷4.884秒,寫文件開銷20秒。總耗時約44秒左右。
但是看出程序寫文件很慢,本例中采用iterator迭代取值寫文件,如果直接采用索引下標會不會更快?或者采用copy函數和stream_interator?
四.在一的基礎上,最后寫文件時采用下標而不是迭代器
發現並無明顯改進。
五.采用統一復制,配合ostream_iterator使用,在此例中速度縮短近一半。
- #include<fstream>
- #include<iostream>
- #include <vector>
- #include<algorithm>
- #include<ctime>
- #include <iterator>
- using namespace std;
- void main()
- {
- ifstream fwhite;
- int number;
- vector<int> white_list;
- clock_t cost;
- fwhite.open("largeW.txt");
- if(!fwhite.is_open())
- {//or use .good .fail or directly use ! to judge if the file has been opened successfully
- cout<<"can't open file list"<<endl;
- exit(EXIT_FAILURE);
- }
- cost=clock();
- while(!fwhite.eof())
- {
- fwhite>>number;
- white_list.push_back(number);
- }
- cost=clock()-cost;
- cout<<"Time to load data : "<<cost<<endl;
- sort(white_list.begin(),white_list.end());
- white_list.erase(unique(white_list.begin(),white_list.end()),white_list.end());
- cost = clock()-cost;
- cout<<"Time to remove reduplicative data from vector : "<<cost<<endl;
- ofstream fout("sort_white.txt",ios::trunc);
- /*vector<int>::iterator iter=white_list.begin();
- while (iter!= white_list.end())
- {
- fout<<*iter<<endl;
- iter++;
- }*/
- //for(unsigned int index = 0;index< white_list.size();index++)
- //{
- // fout<<white_list[index]<<endl;
- //}
- copy(white_list.begin(),white_list.end(),ostream_iterator<int,char>(fout,"\n"));
- cost = clock()-cost;
- cout<<"Time to write data into file : "<<cost<<endl;
- exit(EXIT_SUCCESS);
- };
另外:largeW文件是從《算法4》的網站得到的,或者可以采用rand函數先自己制造一個。每行一個int型整數,100萬行即可。
