Add more popular datasets to graphscope built-in datasets · alibaba/GraphScope#1015

Repository metrics

Stars: (2,401 stars)
PR merge metrics: (Avg merge 1m) (7 merged PRs in 30d)

Description

We have several built-in datasets that can be easily loaded in one-line, located in the dataset directory of Aliyun OSS bucket graphscope, and the corresponding utility function to load them, located in python/graphscope/dataset/. We are planning to enrich the datasets continuously.

There's the procedure to add new datasets:

Find a popular and appropriate dataset, adapt the format to property graph if necessary,
Put all data files inside a folder, give the folder a meaningful name,
Compress the folder, then upload the compressed file together with the original folder to the dataset folder of the OSS bucket. Assume you have a folder named foo/, and two files foo/nodes.csv and foo/edge.csv, after this step, you will have the following file structure in the bucket:

dataset
|-- foo.tar.gz
|-- foo
    |-- nodes.csv
    |-- edge.csv

Write the loading function load_foo in a new file named python/graphscope/dataset/foo.py.
A corresponding unit test is appreciated!

Contributor guide

Research direction: Locate a popular graph dataset, convert to property graph format (nodes.csv, edges.csv), compress to tar.gz, upload to OSS bucket, implement load function in Python, and add a unit test.
Tech stack: python
Domain: backenddata
Issue type: Feature
Difficulty: 3
Estimated time: 1-2 days
Activity status: Active
Clarity: Clear
Prerequisites: PythonGit
Newbie friendliness: 80

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.