快速掃描文本文件，統計行數，並返回每一行的索引位置(Delphi、C#)

本文轉載自查看原文 2012-01-12 15:39 6502 delphi/ C#/ 文件/ 統計行/ Delphi/ C#工作學習筆記

由項目需要，需要掃描1200萬行的文本文件。經網友的指點與測試，發現C#與Delphi之間的差距並不大。不多說，列代碼測試：

下面是Delphi的代碼：

//遍歷文件查找回車出現的次數
function ScanEnterFile( const FileName: string):TInt64Array;
var
  MyFile:TMemoryStream;//文件內存
  rArray:TInt64Array;       //行索引結果集
  size,curIndex:int64;//文件大小，當前流位置
  enterCount:int64;//回車數量
  DoLoop:Boolean;//是否繼續循環
  pc: PChar;
  arrayCount:int64;//當前索引數組大小
  addStep:integer;//檢測到回車字符串時需要添加的步進
begin
   if fileName = '' then
    Exit;
   if not FileExists(fileName) then
    Exit;
  MyFile:=TMemoryStream.Create;//創建流
  MyFile.LoadFromFile(fileName);//把流入口映射到MyFile對象
  size:=MyFile.Size;
  pc:=MyFile.Memory; //把字符指針指向內存流
  curIndex:=RowLeast;
  DoLoop:=true;
  enterCount:= 0;
  setlength(rArray,perArray);
  arrayCount:=perArray;
  enterCount:= 0;
  rArray[enterCount]:= 0;
   while DoLoop do
   begin
    addStep:= 0;
     if (ord(pc[curIndex])= 13) then
      addStep:= 2;
     if (ord(pc[curIndex])= 10) then
      addStep:= 1;
    //處理有回車的
     if (addStep<> 0) then
     begin
      Application.ProcessMessages;
      //增加一行記錄
      inc(enterCount);
      //判斷是否需要增大數組
       if (enterCount mod perArray= 0) then
       begin
        arrayCount:=arrayCount+perArray;
        setlength(rArray,arrayCount);
       end;
      rArray[enterCount]:=curIndex+addStep;
      curIndex:=curIndex+addStep+RowLeast;
     end
     else
      curIndex:=curIndex+ 2;
     if curIndex> size then
      DoLoop:=false
     else
      DoLoop:=true;
   end;
  result:=rArray;
  freeandnil(MyFile);
end;

執行代碼：

procedure TMainForm.btn2Click(Sender: TObject);
var
  datasIndex:TInt64Array;//數據文件索引
begin

  t1:=GetTickCount;
  datasIndex:=ScanEnterFile( ' R:\201201_dataFile.txt ');
  Caption:=Caption+ ' :: '+inttostr(GetTickCount-t1);
end;

執行結果是：16782 ms

下面是C#的代碼：

         /// <summary>
         /// 掃描文本文件，進行行數的統計，並返回每一行的開始指針數組(1.2KW數據速度比使用數組的快10秒)
         /// </summary>
         /// <param name="fileName"> 文件名 </param>
         /// <param name="rowCount"> 行數 </param>
         /// <param name="rowLeast"> 一行最小長度 </param>
         /// <param name="incCount"> 遞增索引數組數量 </param>
         /// <param name="initCount"> 首次初始化行索引數量 </param>
         /// <returns> 索引列表 </returns>
         public static IList< long> ScanEnterFile( string fileName, out int rowCount, int rowLeast,ThreadProgress progress)
        {
            rowCount = 0;
             if ( string.IsNullOrEmpty(fileName))
                 return null;
             if (!System.IO.File.Exists(fileName))
                 return null;
            FileStream myFile = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8); // 把文件讀入流
            IList< long> rList= new List< long>();
             int enterCount = 0; // 回車數量
             int checkValue;
             int addStep;
            myFile.Position = rowLeast;
            checkValue = myFile.ReadByte();
             while (checkValue != - 1)
            {
                 // Application.DoEvents();
                addStep = - 1;
                 // 由於文件ReadByte之后，其當前位置已經往后推移了移位。
                 // 因此，如果是回車的第一個字符，則要推移一位。
                 // 而如果是回車的第二個字符，則不用推移一位
                 if (checkValue == 13)
                    addStep = 1;
                 else if (checkValue == 10)
                    addStep = 0;
                 if (addStep >= 0)
                {
                    enterCount++;
                    rList.Add(myFile.Position + addStep);
                    myFile.Seek(rowLeast + addStep, SeekOrigin.Current);
                    progress(enterCount);
                }
                 else myFile.Seek( 2, SeekOrigin.Current);
                checkValue = myFile.ReadByte();
            }
            rowCount = enterCount + 1;
             return rList;
        }

執行的代碼：

            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
             int rowCount;
            FileHelper.ScanEnterFile( @" R:\201201_dataFile.txt ", out rowCount, 35, outputProgress);
            useTime = stopwatch.ElapsedMilliseconds;

執行結果是：

124925 ms

（經過眾多網友的批評與指點，該方法並沒有把文件讀取內存中，而是逐個字節地讀取，速度比Delphi字節讀進內存的方法要慢很多。這種方法只適合於老機器，內存不夠的情況下，當今內存已經很便宜了，所以，該方法目前已經過時了，下面經過網友的指點，使用了readline的方法，速度大概是6秒左右。）

         public static IList< long> ScanEnterFile( string fileName, ThreadProgress progress)
        {
             if ( string.IsNullOrEmpty(fileName))
                 return null;
             if (!System.IO.File.Exists(fileName))
                 return null;
            IList< long> rList = new List< long>();
            rList.Add( 0);
            StreamReader sr = File.OpenText(fileName);
             string rStr = sr.ReadLine();
             while ( null != rStr)
            {
                rList.Add(rList[rList.Count- 1] + rStr.Length + 2);
                rStr = sr.ReadLine();
                progress(rList.Count);
            }
            sr.Close();
             return rList;
        }

經過測試，該方法如果存在中文字符編碼的時候，其位置是錯誤的。日后找到解決方法后，再上來更新。

經過測試，C#的使用IList<T>比數組的要快。

總結：任何事物都有其存在的價值，至於看官門選什么，就根據自己的需要，來選擇，這里，本人不會有任何偏向於哪一方。反正，能成事，什么都不重要了。

原創作品出自努力偷懶，轉載請說明文章出處： http://blog.csdn.net/kfarvid 或 http://www.cnblogs.com/kfarvid/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 C#讀取文本文件某一行 C#快速隨機按行讀取大型文本文件 C#讀寫文本文件 C# 文本文件的讀寫 [轉]C++按行讀取文本文件 java刪除文本文件最后一行為NUL的字符 Linux下如何高效刪除一個幾十G的文本文件的最后一行或幾行 C#向文本文件中寫入日志 c# 操作文本文件 C#寫入和讀出文本文件