科學上網時看到的有關於Data Science的理解,感覺挺好的,就翻一下。
Data science is about understanding systems, whether they be natural systems such as climate, or man-made systems like the economy.
(數據科學可以稱之為理解系統,無論這個系統是自然系統,例如天氣系統,或者人造的生態環境系統)。
Scientists have been conducting experiments for centuries, but recent advances in technology have enabled us to utilize data to understand systems at a much larger scale.
(科學家已經為這個方向努力了幾個世紀,最近在科學技術方面的發展,使得我們能夠通過利用大規模的數據來理解系統)
Individual data points represent snapshots of a system's behavior, and as you collect more data on that system, you build up a dataset that you can use to analyze and understand the system as a whole. Some examples of datasets include:
單個的數據點代表了一個系統行為的快照,當你對於這個系統收集的數據越來越多,你就可以創建一個數據集用來分析和理解整個系統。一些數據集的例子如下:
A public company's daily stock prices
上市公司每天的股票價格
GPS location data for Uber rides
對於使用Uber的GPS信息
Familiar software products like Microsoft Excel allow you to explore data, but aren't suitable for data science because they:
與之類似的軟件產品Microsoft Excel也使得你能夠瀏覽數據,但是並不適用於數據科學,主要有以下幾點問題:
don't scale to larger datasets.
Excel並不能擴展到更大的數據集
don't allow you tweak and run machine learning algorithms.
並不能使你運行相應的機器學習算法
make it challenging to reproduce your work.
重現你的工作是一個很大的挑戰
附張圖:
以前看的一個數據分析和數據科學的介紹視頻截的圖: