.Net調用科大訊飛OCR接口識別圖片中的印刷文字


目前市場上的OCR我了解到的有谷歌、科大訊飛、百度,本文主要介紹.Net中如何調用科大訊飛的接口識別圖片文字:

一:注冊賬號、實名認證后可領取免費的識別次數:

 

如圖:創建項目后方可獲得對應的id和密碼;

因為我是用Api的方式請求接口,所以只需要加上參數模仿Http請求即可,不需要引用Dll依賴,也可以用SDK的方式去識別,后邊用百度的OCR我就是用SDK的方式:

 public static String Md5(string s)
        {
            System.Security.Cryptography.MD5 md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
            byte[] bytes = System.Text.Encoding.UTF8.GetBytes(s);
            bytes = md5.ComputeHash(bytes);
            md5.Clear();
            string ret = "";
            for (int i = 0; i < bytes.Length; i++)
            {
                ret += Convert.ToString(bytes[i], 16).PadLeft(2, '0');
            }
            return ret.PadLeft(32, '0');
        }

        public static void Headers()
        {
            string x_appid = "*****";
            string api_key = "********";
            string path = @"E:\imgFile\15.jpg";
            string param = @"{""language"":""en"",""location"": ""false""}";

            System.Text.Encoding encode = System.Text.Encoding.ASCII;
            byte[] bytedata = encode.GetBytes(param);
            string x_param = Convert.ToBase64String(bytedata);

            TimeSpan ts = DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, 0);
            string curTime = Convert.ToInt64(ts.TotalSeconds).ToString();

            MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
            string result = string.Format("{0}{1}{2}", api_key, curTime, x_param);
            string X_checksum = Program.Md5(result);

            byte[] arr = File.ReadAllBytes(path);
            string cc = Convert.ToBase64String(arr);
            string data = "image=" + cc;

            string Url = @"https://webapi.xfyun.cn/v1/service/v1/ocr/general";

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
            request.Method = "POST";
            request.ContentType = "application/x-www-form-urlencoded; charset=utf-8";
            request.Headers["X-Appid"] = x_appid;
            request.Headers["X-CurTime"] = curTime;
            request.Headers["X-Param"] = x_param;
            request.Headers["X-CheckSum"] = X_checksum;

            request.ContentLength = Encoding.UTF8.GetByteCount(data);
            Stream requestStream = request.GetRequestStream();
            StreamWriter streamWriter = new StreamWriter(requestStream, Encoding.GetEncoding("gb2312"));
            streamWriter.Write(data);
            streamWriter.Close();

            string htmlStr = string.Empty;
            HttpWebResponse response = request.GetResponse() as HttpWebResponse;
            Stream responseStream = response.GetResponseStream();
            using (StreamReader reader = new StreamReader(responseStream, Encoding.GetEncoding("UTF-8")))
            {
                htmlStr = reader.ReadToEnd();
            }
            responseStream.Close();

            var json = JsonConvert.DeserializeObject<Root>(htmlStr);
            string str = string.Empty;
            
            foreach (var item1 in json.data.block)
            {
                
                foreach (var item in item1.line)
                {
                    foreach (var item2 in item.word)
                    {
                        str += item2.content+item.confidence;
                    }
                }
            }
            Console.WriteLine(str);
            Console.ReadLine();

        }


        static void Main(string[] args)
        {
            Headers();
        }

識別出來的是json數據:

用Newtonsoft.json反序列化一下就得到數據了:

對象類:

public class Root
    {
        /// <summary>
        /// 
        /// </summary>
        public string code { get; set; }
        /// <summary>
        /// 
        /// </summary>
        public Data data { get; set; }
        /// <summary>
        /// 
        /// </summary>
        public string desc { get; set; }
        /// <summary>
        /// 
        /// </summary>
        public string sid { get; set; }
    }
    public class Data
    {
        /// <summary>
        /// 
        /// </summary>
        public List<BlockItem> block { get; set; }
    }
    public class BlockItem
    {
        /// <summary>
        /// 
        /// </summary>
        public string type { get; set; }
        /// <summary>
        /// 
        /// </summary>
        public List<LineItem> line { get; set; }
    }
    public class LineItem
    {
        /// <summary>
        /// 
        /// </summary>
        public int confidence { get; set; }
        /// <summary>
        /// 
        /// </summary>
        public List<WordItem> word { get; set; }
    }
    public class WordItem
    {
        /// <summary>
        /// 贛到時段
        /// </summary>
        public string content { get; set; }
    }

上面的demo是在官方給的demo中改出來的:

附上img路徑base64編碼的方法封裝:

#region Image To base64
        public static Image UrlToImage(string url)
        {
            WebClient mywebclient = new WebClient();
            byte[] Bytes = mywebclient.DownloadData(url);
            using (MemoryStream ms = new MemoryStream(Bytes))
            {
                Image outputImg = Image.FromStream(ms);
                return outputImg;
            }
        }

        /// <summary>
        /// Image 轉成 base64
        /// </summary>
        /// <param name="fileFullName"></param>
        public static string ImageToBase64(Image img)
        {
            try
            {
                Bitmap bmp = new Bitmap(img);
                MemoryStream ms = new MemoryStream();
                bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
                byte[] arr = new byte[ms.Length];
                ms.Position = 0;
                ms.Read(arr, 0, (int)ms.Length);
                ms.Close();
                return Convert.ToBase64String(arr);
            }
            catch (Exception ex)
            {
                return null;
            }
        }
        public static string ImageToBase64(string url)
        {
            return ImageToBase64(UrlToImage(url));
        }
        #endregion
View Code

識別出來的結果可以用正則表達式解析出自己想要的數據;

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM