:star: Repository Template for Datasets

View the Project on GitHub opprDev/repo-template-datasets

Datasets Repository Template

License: AGPL v3 Last commit OpenCollective OpenCollective Gitter Twitter

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we aim to provide a template repository for datasets. The repository was developed by a community of people under the opprDev Team from oppr. This template will be used in BreastScreening, MIDA and MIMBCD-UI projects.



We kindly ask scientific works and studies that make use of the repository to cite it in their associated publications. Similarly, we ask open-source and closed-source works that make use of the repository to warn us about this use.

You can cite our work using the following BibTeX entry:

  doi = {10.5281/ZENODO.3738763},
  url = {},
  author = {Calisto,  Francisco Maria},
  title = {opprDev/repo-template-datasets: v0.1.1-alpha},
  publisher = {Zenodo},
  year = {2020}

Table of contents


The following list is showing the required dependencies for this project to run locally:

Here are some tutorials and documentation, if needed, to feel more comfortable about using and playing around with this repository:


Usage follow the instructions here to setup the current repository and extract the present data. To understand how the hereby repository is used for, read the following steps.


At this point, the only way to install this repository is manual. Eventually, this will be accessible through pip or any other package manager, as mentioned on the roadmap.

Nonetheless, this kind of installation is as simple as cloning this repository. Virtually all Git and GitHub version control tools are capable of doing that. Through the console, we can use the command below, but other ways are also fine.

git clone

Optionally, the module/directory can be installed into the designated Python interpreter by moving it into the site-packages directory at the respective Python directory.


Please, feel free to try out our demo. It is a script called at the src/ directory. It can be used as follows:

python src/

Just keep in mind this is just a demo, so it does nothing more than downloading data to an arbitrary destination directory if the directory does not exist or does not have any content. Also, we did our best to make the demo as user-friendly as possible, so, above everything else, have fun! 😁


CII Best Practices

We need to follow the repository goal, by addressing the thereby information. Therefore, it is of chief importance to scale this solution supported by the repository. The repository solution follows the best practices, achieving the Core Infrastructure Initiative (CII) specifications.

Besides that, one of our goals involves creating a configuration file to automatically test and publish our code to pip or any other package manager. It will be most likely prepared for the GitHub Actions. Other goals may be written here in the future.


This project exists thanks to all the people who contribute. We welcome everyone who wants to help us improve this downloader. As follows, we present some suggestions.


Either as something that seems missing or any need for support, just open a new issue. Regardless of being a simple request or a fully-structured feature, we will do our best to understand them and, eventually, solve them.


We like to develop, but we also like collaboration. You could ask us to add some features… Or you could want to do it yourself and fork this repository. Maybe even do some side-project of your own. If the latter ones, please let us share some insights about what we currently have.


The current information will summarize important items of this repository. In this section, we address all fundamental items that were crucial to the current information.

The following list, represents the set of related repositories for the presented one:

Dataset Resources

To publish our datasets we used a well known platform called Kaggle. To access our project’s Profile Page just follow the link.

Copyright © 2019 oppr

Creative Commons License

The repo-template-datasets repository is distributed under the terms of GNU AGPLv3 license and CC-BY-SA-4.0 copyright. Permissions of this license are conditioned on making available complete elements from this repository of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved.


Our team brings everything together sharing ideas and the same purpose, developing even better work. In this section, we will nominate the full list of important people for this repository, as well as respective links.




Our organization is a non-profit organization. However, we have many needs across our activity. From infrastructure to service needs, we need some time and contribution, as well as help, to support our team and projects.


This project exists thanks to all the people who contribute. [Contribute].


Thank you to all our backers! 🙏 [Become a backer]


Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

fct fccn ulisboa ist hff


dei dei


sipg isr larsys iti inesc-id


eu pt