通過kaggle api下載數據集


Kaggle API使用教程


https://www.kaggle.com 的官方 API ,可使用 Python 3 中實現的命令行工具訪問。

Beta 版 - Kaggle 保留修改當前提供的 API 功能的權利。

重要提示:使用 1.5.0 之前的 API 版本提交的比賽可能無法正常工作。如果您在提交競賽時遇到困難,請使用 來檢查您的版本kaggle --version。如果低於 1.5.0,請更新pip install kaggle --upgrade.

一、安裝Kaggle環境並配置

1.1 安裝Kaggle Package

確保您安裝了 Python 3 和包管理pip器。

運行以下命令以使用命令行訪問 Kaggle API:

pip install kaggle(您可能需要pip install --user kaggle在 Mac/Linux 上執行。如果在安裝過程中出現問題,建議這樣做。)sudo pip install kaggle除非您了解自己在做什么,否則通過 root 用戶(即)完成的安裝將無法正常工作。即便如此,它們仍然可能不起作用。如果出現權限錯誤,強烈建議用戶安裝。

您現在可以使用kaggle以下示例中所示的命令。

如果遇到kaggle: command not found錯誤,請確保您的 Python 二進制文件在您的路徑上。您可以kaggle通過執行pip uninstall kaggle並查看二進制文件的位置來查看安裝位置。對於 Linux 上的本地用戶安裝,默認位置是~/.local/bin. 在 Windows 上,默認位置是$PYTHON_HOME/Scripts.

重要提示:我們不提供 Python 2 支持。在報告任何問題之前,請確保您使用的是 Python 3。

1.2 API Token配置

要使用 Kaggle API,請在https://www.kaggle.com注冊一個 Kaggle 帳戶。

注冊成功后登錄kaggle

  • 點擊右上角頭像處,會彈出相關側邊欄設置,如下
  • 點擊Your Profile,進入設置

  • 在上面的頁面找到API對應的設置,點擊Create New Token,這將觸發下載包含您的 API 憑據的文件kaggle.json。對應的kagge.json如下

kaggle配置
本機安裝kaggle api

pip install kaggle

將此文件放在該位置~/.kaggle/kaggle.json

若沒有這個目錄,則在根目錄下創建.kaggle文件夾,再把kaggle.json放入

cd ~
mkdir .kaggle
cd ~/.kaggle/

(在 Windows 上的該位置C:\Users\<Windows-username>\.kaggle\kaggle.json- 您可以檢查確切位置,無驅動器,使用echo %HOMEPATH%)。您可以定義一個 shell 環境變量KAGGLE_CONFIG_DIR來將此位置更改為$KAGGLE_CONFIG_DIR/kaggle.json(在 Windows 上為%KAGGLE_CONFIG_DIR%\kaggle.json)。

為了您的安全,請確保您計算機的其他用戶對您的憑據沒有讀取權限。在基於 Unix 的系統上,您可以使用以下命令執行此操作:

chmod 600 ~/.kaggle/kaggle.json

您還可以選擇將您的 Kaggle 用戶名和令牌導出到環境中:

導出KAGGLE_USERNAME=datadinosaur
導出KAGGLE_KEY=xxxxxxxxxxxxxx

此外,您可以導出通常采用$HOME/.kaggle/kaggle.json“KAGGLE_”格式(注意大寫)的任何其他配置值。
例如,如果文件具有變量“proxy”,您將導出KAGGLE_PROXY 並由客戶端查看。

二、Kaggle Command命令使用

命令行工具支持以下命令:

kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}

有關使用這些命令中的每一個,請參閱下面的更多詳細信息。

2.1 Competitions比賽

該 API 支持以下用於 Kaggle 比賽的命令。

usage: kaggle competitions [-h]
                           {list,files,download,submit,submissions,leaderboard}
                           ...
optional arguments:
  -h, --help            show this help message and exit
commands:
  {list,files,download,submit,submissions,leaderboard}
    list                List available competitions
    files               List competition files
    download            Download competition files
    submit              Make a new competition submission
    submissions         Show your competition submissions
    leaderboard         Get competition leaderboard information
2.1.1 列出比賽
usage: kaggle competitions list [-h] [--group GROUP] [--category CATEGORY] [--sort-by SORT_BY] [-p PAGE] [-s SEARCH] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --group GROUP         Search for competitions in a specific group. Default is 'general'. Valid options are 'general', 'entered', and 'inClass'
  --category CATEGORY   Search for competitions of a specific category. Default is 'all'. Valid options are 'all', 'featured', 'research', 'recruitment', 'gettingStarted', 'masters', and 'playground'
  --sort-by SORT_BY     Sort list results. Default is 'latestDeadline'. Valid options are 'grouped', 'prize', 'earliestDeadline', 'latestDeadline', 'numberOfTeams', and 'recentlyCreated'
  -p PAGE, --page PAGE  Page number for results paging. Page size is 20 by default 
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  -v, --csv             Print results in CSV format
                        (if not set print in table format)

使用實例:

kaggle competitions list -s health
kaggle competitions list --category gettingStarted
2.1.2 列出比賽文件
usage: kaggle competitions files [-h] [-v] [-q] [competition]

optional arguments:
  -h, --help   show this help message and exit
  competition  Competition URL suffix (use "kaggle competitions list" to show options)
               If empty, the default competition will be used (use "kaggle config set competition")"
  -v, --csv    Print results in CSV format (if not set print in table format)
  -q, --quiet  Suppress printing information about the upload/download progress

使用實例:

kaggle competitions files favorita-grocery-sales-forecasting
2.1.3 下載比賽文件
usage: kaggle competitions download [-h] [-f FILE_NAME] [-p PATH] [-w] [-o]
                                    [-q]
                                    [competition]

optional arguments:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -f FILE_NAME, --file FILE_NAME
                        File name, all files downloaded if not provided
                        (use "kaggle competitions files -c <competition>" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress

使用實例:

kaggle competitions download favorita-grocery-sales-forecasting
kaggle competitions download favorita-grocery-sales-forecasting -f test.csv.7z

注意:您需要在 接受比賽規則https://www.kaggle.com/c/<competition-name>/rules

2.1.4 提交比賽
usage: kaggle competitions submit [-h] -f FILE_NAME -m MESSAGE [-q]
                                  [competition]

required arguments:
  -f FILE_NAME, --file FILE_NAME
                        File for upload (full path)
  -m MESSAGE, --message MESSAGE
                        Message describing this submission

optional arguments:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -q, --quiet           Suppress printing information about the upload/download progress

使用實例:

kaggle competitions submit favorita-grocery-sales-forecasting -f sample_submission_favorita.csv.7z -m "My submission message"

注意:您需要在 接受比賽規則https://www.kaggle.com/c/<competition-name>/rules

2.1.5 列出參賽作品
usage: kaggle competitions submissions [-h] [-v] [-q] [competition]

optional arguments:
  -h, --help   show this help message and exit
  competition  Competition URL suffix (use "kaggle competitions list" to show options)
               If empty, the default competition will be used (use "kaggle config set competition")"
  -v, --csv    Print results in CSV format (if not set print in table format)
  -q, --quiet  Suppress printing information about the upload/download progress

使用實例:
kaggle competitions submissions favorita-grocery-sales-forecasting

注意:您需要在 接受比賽規則https://www.kaggle.com/c/ /rules。

2.1.6 獲取比賽排行榜
usage: kaggle competitions leaderboard [-h] [-s] [-d] [-p PATH] [-v] [-q]
                                       [competition]

optional arguments:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -s, --show            Show the top of the leaderboard
  -d, --download        Download entire leaderboard
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -v, --csv             Print results in CSV format (if not set print in table format)
  -q, --quiet           Suppress printing information about the upload/download progress

例子:

kaggle competitions leaderboard favorita-grocery-sales-forecasting -s

2.2 數據集

API 支持以下用於 Kaggle 數據集的命令。

usage: kaggle datasets [-h]
                       {list,files,download,create,version,init,metadata,status} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {list,files,download,create,version,init,metadata, status}
    list                List available datasets
    files               List dataset files
    download            Download dataset files
    create              Create a new dataset
    version             Create a new dataset version
    init                Initialize metadata file for dataset creation
    metadata            Download metadata about a dataset
    status              Get the creation status for a dataset
2.2.1 列出數據集
usage: kaggle datasets list [-h] [--sort-by SORT_BY] [--size SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME] [--tags TaG_IDS] [-s SEARCH] [-m] [--user USER] [-p PAGE] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --sort-by SORT_BY     Sort list results. Default is 'hottest'. Valid options are 'hottest', 'votes', 'updated', and 'active'
  --size SIZE           Search for datasets of a specific size. Default is 'all'. Valid options are 'all', 'small', 'medium', and 'large'
  --file-type FILE_TYPE Search for datasets with a specific file type. Default is 'all'. Valid options are 'all', 'csv', 'sqlite', 'json', and 'bigQuery'. Please note that bigQuery datasets cannot be downloaded
  --license LICENSE_NAME 
                        Search for datasets with a specific license. Default is 'all'. Valid options are 'all', 'cc', 'gpl', 'odb', and 'other'
  --tags TAG_IDS        Search for datasets that have specific tags. Tag list should be comma separated                      
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  -m, --mine            Display only my items
  --user USER           Find public datasets owned by a specific user or organization
  -p PAGE, --page PAGE  Page number for results paging. Page size is 20 by default
  -v, --csv             Print results in CSV format (if not set print in table format)

使用實例:

kaggle datasets list -s demographics
kaggle datasets list --sort-by votes
2.2.2 列出數據集的文件
usage: kaggle datasets files [-h] [-v] [dataset]

optional arguments:
  -h, --help  show this help message and exit
  dataset     Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
  -v, --csv   Print results in CSV format (if not set print in table format)

使用實例:

kaggle datasets files zillow/zecon
2.2.3 下載數據集文件
usage: kaggle datasets download [-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip]
                                [-o] [-q]
                                [dataset]

optional arguments:
  -h, --help            show this help message and exit
  dataset               Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
  -f FILE_NAME, --file FILE_NAME
                        File name, all files downloaded if not provided
                        (use "kaggle datasets files -d <dataset>" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  --unzip               Unzip the downloaded file. Will delete the zip file when completed.
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress

使用實例:

kaggle datasets download zillow/zecon
kaggle datasets download zillow/zecon -f State_time_series.csv

請注意,無法下載 BigQuery 數據集。

在對應數據集上找到API command,復制到剪切板

如上面這個數據集的命令就是:

kaggle datasets download -d cisautomotiveapi/large-car-dataset

2.2.4 初始化元數據文件以創建數據集
usage: kaggle datasets init [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory

使用實例:

kaggle datasets init -p /path/to/dataset
2.2.5 創建新數據集

如果要創建新的數據集,首先需要啟動元數據文件。您可以通過kaggle datasets init如上所述運行來實現這一點。

usage: kaggle datasets create [-h] [-p FOLDER] [-u] [-q] [-t] [-r {skip,zip,tar}]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory
  -u, --public          Create publicly (default is private)
  -q, --quiet           Suppress printing information about the upload/download progress
  -t, --keep-tabular    Do not convert tabular files to CSV (default is to convert)
  -r {skip,zip,tar}, --dir-mode {skip,zip,tar}
                        What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload

使用實例:

kaggle datasets create -p /path/to/dataset
2.2.6 創建新的數據集版本
usage: kaggle datasets version [-h] -m VERSION_NOTES [-p FOLDER] [-q] [-t]
                               [-r {skip,zip,tar}] [-d]

required arguments:
  -m VERSION_NOTES, --message VERSION_NOTES
                        Message describing the new version

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory
  -q, --quiet           Suppress printing information about the upload/download progress
  -t, --keep-tabular    Do not convert tabular files to CSV (default is to convert)
  -r {skip,zip,tar}, --dir-mode {skip,zip,tar}
                        What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
  -d, --delete-old-versions
                        Delete old versions of this dataset

使用實例:

kaggle datasets version -p /path/to/dataset -m "Updated data"
2.2.7 下載現有數據集的元數據
usage: kaggle datasets metadata [-h] [-p PATH] [dataset]

optional arguments:
  -h, --help            show this help message and exit
  dataset               Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)
  -p PATH, --path PATH  Location to download dataset metadata to. Defaults to current working directory

使用實例:

kaggle datasets metadata -p /path/to/download zillow/zecon
2.2.8 取數據集創建狀態
usage: kaggle datasets status [-h] [dataset]

optional arguments:
  -h, --help  show this help message and exit
  dataset     Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)

使用實例:

kaggle datasets status zillow/zecon

2.3 kernel內核

該 API 支持 Kaggle 內核的以下命令。

usage: kaggle kernels [-h] {list,init,push,pull,output,status} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {list,init,push,pull,output,status}
    list                List available kernels
    init                Initialize metadata file for a kernel
    push                Push new code to a kernel and run the kernel
    pull                Pull down code from a kernel
    output              Get data output from the latest kernel run
    status              Display the status of the latest kernel run
2.3.1 列出內核
usage: kaggle kernels list [-h] [-m] [-p PAGE] [--page-size PAGE_SIZE] [-s SEARCH] [-v]
                           [--parent PARENT] [--competition COMPETITION]
                           [--dataset DATASET]
                           [--user USER] [--language LANGUAGE]
                           [--kernel-type KERNEL_TYPE]
                           [--output-type OUTPUT_TYPE] [--sort-by SORT_BY]

optional arguments:
  -h, --help            show this help message and exit
  -m, --mine            Display only my items
  -p PAGE, --page PAGE  Page number for results paging. Page size is 20 by default
  --page-size PAGE_SIZE Number of items to show on a page. Default size is 20, max is 100
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  -v, --csv             Print results in CSV format (if not set print in table format)
  --parent PARENT       Find children of the specified parent kernel
  --competition COMPETITION
                        Find kernels for a given competition
  --dataset DATASET     Find kernels for a given dataset
  --user USER           Find kernels created by a given user
  --language LANGUAGE   Specify the language the kernel is written in. Default is 'all'. Valid options are 'all', 'python', 'r', 'sqlite', and 'julia'
  --kernel-type KERNEL_TYPE
                        Specify the type of kernel. Default is 'all'. Valid options are 'all', 'script', and 'notebook'
  --output-type OUTPUT_TYPE
                        Search for specific kernel output types. Default is 'all'. Valid options are 'all', 'visualizations', and 'data'
  --sort-by SORT_BY     Sort list results. Default is 'hotness'.  Valid options are 'hotness', 'commentCount', 'dateCreated', 'dateRun', 'relevance', 'scoreAscending', 'scoreDescending', 'viewCount', and 'voteCount'. 'relevance' is only applicable if a search term is specified.

使用實例:

kaggle kernels list -s titanic
kaggle kernels list --language python
2.3.2 為內核初始化元數據文件
usage: kaggle kernels init [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata). Defaults to current working directory

使用實例:

kaggle kernels init -p /path/to/kernel
2.3.3 推送內核
usage: kaggle kernels push [-h] -p FOLDER

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata). Defaults to current working directory

使用實例:

kaggle kernels push -p /path/to/kernel
2.3.4 拉一個內核
usage: kaggle kernels pull [-h] [-p PATH] [-w] [-m] [kernel]

optional arguments:
  -h, --help            show this help message and exit
  kernel                Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -m, --metadata        Generate metadata when pulling kernel

使用實例:

kaggle kernels pull rtatman/list-of-5-day-challenges -p /path/to/dest
2.3.5 檢索內核的輸出
usage: kaggle kernels output [-h] [-p PATH] [-w] [-o] [-q] [kernel]

optional arguments:
  -h, --help            show this help message and exit
  kernel                Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress

使用實例:

kaggle kernels output mrisdal/exploring-survival-on-the-titanic -p /path/to/dest
2.4.6 獲取最新內核運行的狀態
usage: kaggle kernels status [-h] [kernel]

optional arguments:
  -h, --help  show this help message and exit
  kernel      Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)

使用實例:

kaggle kernels status mrisdal/exploring-survival-on-the-titanic

2.4 Config配置

API 支持以下命令進行配置。

usage: kaggle config [-h] {view,set,unset} ...

optional arguments:
  -h, --help        show this help message and exit

commands:
  {view,set,unset}
    view            View current config values
    set             Set a configuration value
    unset           Clear a configuration value
2.4.1 查看當前配置值
usage: kaggle config path [-h] [-p PATH]

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  folder where file(s) will be downloaded, defaults to current working directory

使用實例:

kaggle config path -p C:\
2.4.2 查看當前配置值
usage: kaggle config view [-h]

optional arguments:
  -h, --help  show this help message and exit

使用實例:

kaggle config view
2.4.3 設置配置值
usage: kaggle config set [-h] -n NAME -v VALUE

required arguments:
  -n NAME, --name NAME  Name of the configuration parameter
                        (one of competition, path, proxy)
  -v VALUE, --value VALUE
                        Value of the configuration parameter, valid values depending on name
                        - competition: Competition URL suffix (use "kaggle competitions list" to show options)
                        - path: Folder where file(s) will be downloaded, defaults to current working directory
                        - proxy: Proxy for HTTP requests

使用實例:

kaggle config set -n competition -v titanic
2.4.4 清除配置值
usage: kaggle config unset [-h] -n NAME

required arguments:
  -n NAME, --name NAME  Name of the configuration parameter
                        (one of competition, path, proxy)

使用實例:

kaggle config unset -n competition


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM