IOT數據庫選型——NOSQL,MemSQL,cassandra,Riak或者OpenTSDB,InfluxDB


補充:

Basho公司開源了它的時序數據庫產品Riak TS 1.3

代碼在github riak的riak-ts分支上!

Riak KV產品構建於Riak內核之上,提供了一種高彈性、高可用的鍵值數據庫。Riak KV產品當前正在持續改進中,專注於數據正確性、預防數據損失和破壞等特性。

Riak TS產品源於Riak KV數據庫,是一種為時序數據倉庫而專門構建的產品。其中集成了Riak KV產品的所有強大功能,並使用這些功能去解決用戶在處理時序數據中所遇到的問題。我們在該產品中確實地實現了哪些特性呢?這里我列出了其中的一部分:

  • 數據的快速寫入路徑;
  • 為數據桶建立模式;
  • 查詢規划及查詢子系統;
  • 對虛擬節點的並行數據抽取;
  • 靈活的復合鍵值;

我們也查看了時序數據庫產品的市場情況,當時只見到了寥寥可數的幾個解決方案,並且所有這些解決方案的質量都不足以承擔企業級的生產工作負荷。已有的時序數據解決方案或者是缺乏可擴展集群或彈性,或者是管理和操作非常繁瑣。所有這些使得它們成為糟糕的選擇。

為討論解決這個問題的創意,我們進而開了一次架構會議。最終,我們的一個工程師提出了一個有意思的創意,即使用量子(時間范圍)將數據圍繞哈希 環分布,並基於此創意構建了一個看上去運行良好的概念驗證原型。依此我們開始了Riak TS產品的開發過程,力圖去解決許多時序數據處理中更加困難的問題。

見:

https://elixirforum.com/t/which-database-for-time-series-data/715/6

http://db-engines.com/en/system/Graphite%3BInfluxDB%3BRiak+TS

 

IoT databases should be as flexible as required by the application. NoSQLdatabases -- especially key-value, document and column family databases -- easily accommodate different data types and structures without the need for predefined, fixed schemas. NoSQL databases are good options when an organization has multiple data types and those data types will likely change over time. In other cases, applications that collect a fixed set of data -- such as data on weather conditions -- may benefit from a relational model. In-memory SQL databases, such as MemSQL, offer this benefit.

Managing a database for IoT applications in-house

For those organizations choosing to manage their own databases, DataStax Cassandra is a highly scalable distributed database that supports a flexible big table schema and fast writes and scales to large volumes of data. Riak IoT is a distributed, highly scalable key-value data store which integrates with Apache Spark, a big data analytics platform that enables stream analytic processing. Cassandra also integrates with Spark as well as other big data analytics platforms, such as Hadoop MapReduce.

OpenTSDB is an open source database capable of running on Hadoop andHBase. The database is made up of command line interfaces and a Time Series Daemon (TSD). TSDs, which are responsible for processing all database requests, run independently of one another. Even though TSDs use HBase to store time-series data, TSD users have little to no contact with HBase itself.

MemSQL is a relational database tuned for real-time data streaming. With MemSQL, streamed data, transactions and historical data can be kept within the same database. The database also has the capacity to work well with geospatial data out of the box, which could be useful for location-based IoT applications. MemSQL supports integration with Hadoop Distributed File System and Apache Spark, as well as other data warehousing solutions.

 

摘自:http://internetofthingsagenda.techtarget.com/feature/Find-the-IoT-database-that-best-fits-your-enterprises-needs

 

You’ve heard the hype, the Internet of Things (IoT) is going to connect more people to devices, more devices to the Internet and generate more data than any major IT shift in history. IoT is going to be bigger than the web, mobile and the cloud, right? It’s still too early to tell for sure, but at InfluxData we are helping startups and enterprises everyday bring an interconnected world closer to reality.

What does time-series have to do with IoT? Everything, actually. Sensors and devices used in IoT architectures emit time-series data, and a lot of it.

Why are companies building IoT and sensor data solutions?

Whether it’s pH and humidity readings from an agri-sensor, depth and fluid readings from a geo-sensor or voltage and temperature from a power control sensor, these metrics are forming the basis of intelligent businesses. Common use cases we run across are:

  • Agro industries are monitoring and trying to control environmental conditions for optimal plant growth.
  • Power and utility companies are building smart solutions to reduce resource wastage for residential and commercial customers.
  • Research labs and heavy industries are tracking the resources, usage and health of millions of tiny valves and instruments that go into their massive production plants, factories and manufacturing facilities.
  • Smart cars are now powerful computers making runtime decisions based on data collected by 100s of sensors on every vehicle.

get-started__graphic-3

Challenges in building IoT and sensor data solutions

The key challenges organizations face while building an IoT solution are:

  • Bandwidth – As sensors are generally deployed on-premise and need to communicate over wireless networks, bandwidth constraints prevent sending large packets of data in real-time
  • Horsepower – Compute power on sensors are generally limited. Hence analytics software – programs or databases or even processing logic needs to have a tiny footprint.
  • Concurrency – In case of industrial IoT, number of sensors could easily range in 100s of 1000s, each transmitting metrics every minute or so. Anticipating backend database’s concurrency limits is crucial in the design of such solutions
  • Protocol – As this space is rapidly evolving, there aren’t any definitive standards for communication protocols. MQTT, AMQPP, CoAP etc are being used based on use cases. Hence IoT analytics solutions need to support many communication protocols.
  • Scale – Data retention, compression and visualization has it’s own challenges in such a large data footprint solution. Businesses want to plot trends (WoW, MoM, YoY) and aggregation of massive data sets can be very compute heavy.

 摘自:https://www.influxdata.com/use-cases/iot-and-sensor-data/

 

 

NoSQL Database: The NoSQL database is typically used to address the fast data ingest problem for device data. In some cases, there may be a stream processor—e.g. Storm, Samza, Kinesis, etc.—addressing data filtering and routing and some lightweight processing, such as counts. However, the NoSQL database is typically used because, unlike most SQL databases, which top out at about 5,000 inserts/second, you can get up to 50,000 inserts/second from NoSQL databases. However, NoSQL databases are not designed to handle the analytic processing of the data or joins, which are common requirements for Internet of Things applications. NoSQL effectively provides a real-time data ingest engine for data that is then moved to Hadoop using an extract, transform and load (ETL) process.——NOSQL寫入快,但是數據分析,聯合查詢不方便!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM