【C++】去除vector里重復元素的方法比較

本文轉載自查看原文 2015-06-23 13:25 2787 C++

背景：構造一個無重復的白名單，之后要在里面進行二分查找。故要求名單有序，且無重復，並且要進行二分查找，所以要采用有：隨機訪問迭代器類型的容器。這類容器有vector，array，deque。顯然要vector和deque合適一點，但是deque並沒有體現出其兩端和中間插入時間為固定而非線性的優勢，因為本例都在尾部插入，vector和deque同為固定時間。而deque的隨機存儲操作時間長，故采用vector。

一.利用STL算法unique

首先要將vector排序，排序后。利用erase配合unique算法。利用一個含有一百萬整數，里面重復數字並不太多的情況測試。

[cpp] view plain copy

#include<fstream>
#include<iostream>
#include <vector>
#include<algorithm>
#include<ctime>
using namespace std;
void main()
{
ifstream fwhite;
int number;
vector<int> white_list;
clock_t cost;
fwhite.open("largeW.txt");
if(!fwhite.is_open())
{//or use .good .fail or directly use ! to judge if the file has been opened successfully
cout<<"can't open file list"<<endl;
exit(EXIT_FAILURE);
}
cost=clock();
while(!fwhite.eof())
{
fwhite>>number;
white_list.push_back(number);
}
cost=clock()-cost;
cout<<"Time to load data : "<<cost<<endl;
sort(white_list.begin(),white_list.end());
white_list.erase(unique(white_list.begin(),white_list.end()),white_list.end());
cost = clock()-cost;
cout<<"Time to remove reduplicative data from vector : "<<cost<<endl;
ofstream fout("sort_white.txt",ios::trunc);
vector<int>::iterator iter=white_list.begin();
while (iter!= white_list.end())
{
fout<<*iter<<endl;
iter++;
}
cost = clock()-cost;
cout<<"Time to write data into file : "<<cost<<endl;
exit(EXIT_SUCCESS);
};

二.利用set配合copy

讀數據的時候就用set，然后直接拷貝到vector。但是拷貝的時候要用到insert_iterator來進行插入拷貝。（溢出問題）

[cpp] view plain copy

#include<fstream>
#include<iostream>
#include <vector>
#include<set>
#include<algorithm>
#include<ctime>
#include <iterator>
using namespace std;
void main()
{
ifstream fwhite;
int number;
vector<int> white_list;
set<int> ori_list;
clock_t cost;
fwhite.open("largeW.txt");
if(!fwhite.is_open())
{//or use .good .fail or directly use ! to judge if the file has been opened successfully
cout<<"can't open file list"<<endl;
exit(EXIT_FAILURE);
}
cost=clock();
while(!fwhite.eof())
{
fwhite>>number;
ori_list.insert(number);
}
cost=clock()-cost;
cout<<"Time to load data : "<<cost<<endl;
insert_iterator<vector<int> > it(white_list,white_list.begin());
copy(ori_list.begin(),ori_list.end(),it);
cost = clock()-cost;
cout<<"Time to copy data from set to vector : "<<cost<<endl;
ofstream fout("sort_white.txt",ios::trunc);
vector<int>::iterator iter=white_list.begin();
while (iter!= white_list.end())
{
fout<<*iter<<endl;
iter++;
}
cost = clock()-cost;
cout<<"Time to write data into file : "<<cost<<endl;
exit(EXIT_SUCCESS);
};

三.時間開銷從開始構造容器開始，利用clock計時

第一種耗時：8.477秒

第二種耗時：23.246秒

看出，還是直接用vector就好，然后配合unique好。原因：同樣插入100萬個整數，set用時過長，經測試用去了約18秒。為主要開銷。

第一種：讀取文件到vector開銷5.852秒，排序並去除重復元素開銷3.205秒，寫文件開銷15.624秒。總耗時約24秒左右。

第二種：讀文件到set開銷18.893秒，從set拷貝數據到vector開銷4.884秒，寫文件開銷20秒。總耗時約44秒左右。

但是看出程序寫文件很慢，本例中采用iterator迭代取值寫文件，如果直接采用索引下標會不會更快？或者采用copy函數和stream_interator？

四.在一的基礎上，最后寫文件時采用下標而不是迭代器

發現並無明顯改進。

五.采用統一復制，配合ostream_iterator使用，在此例中速度縮短近一半。

[cpp] view plain copy

#include<fstream>
#include<iostream>
#include <vector>
#include<algorithm>
#include<ctime>
#include <iterator>
using namespace std;
void main()
{
ifstream fwhite;
int number;
vector<int> white_list;
clock_t cost;
fwhite.open("largeW.txt");
if(!fwhite.is_open())
{//or use .good .fail or directly use ! to judge if the file has been opened successfully
cout<<"can't open file list"<<endl;
exit(EXIT_FAILURE);
}
cost=clock();
while(!fwhite.eof())
{
fwhite>>number;
white_list.push_back(number);
}
cost=clock()-cost;
cout<<"Time to load data : "<<cost<<endl;
sort(white_list.begin(),white_list.end());
white_list.erase(unique(white_list.begin(),white_list.end()),white_list.end());
cost = clock()-cost;
cout<<"Time to remove reduplicative data from vector : "<<cost<<endl;
ofstream fout("sort_white.txt",ios::trunc);
/*vector<int>::iterator iter=white_list.begin();
while (iter!= white_list.end())
{
fout<<*iter<<endl;
iter++;
}*/
//for(unsigned int index = 0;index< white_list.size();index++)
//{
// fout<<white_list[index]<<endl;
//}
copy(white_list.begin(),white_list.end(),ostream_iterator<int,char>(fout,"\n"));
cost = clock()-cost;
cout<<"Time to write data into file : "<<cost<<endl;
exit(EXIT_SUCCESS);
};

另外：largeW文件是從《算法4》的網站得到的，或者可以采用rand函數先自己制造一個。每行一個int型整數，100萬行即可。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [C++]vector去除重復元素 vector 去除重復元素（sort + unique） python去除重復元素 PYTHON去除重復元素2 ArrayList去除重復元素（多種方法實現） 47、刪除vector中重復元素刪除vector中的重復元素 TreeSet中不能去除重復元素。。。。。。。。。。。。。。 Js中去除數組中重復元素的6種方法 Js中去除數組中重復元素的幾種方法