“My work relies on bringing together existing observations on animals to answer macro-evolutionary questions. My workflow revolves around finding, processing, and depositing data.”
A useful tool to structure the search for existing data and to record the process for reproducibility is the PRISMA statement. Their guidelines help to keep a record of the process and document all decisions. The main places I search for data are through Google tools (search, scholar, books, data). While alternatives exist (e.g. Pubmed instead of Google Scholar, Ecosia or DuckDuckGo instead of Google Search), I have found the algorithms of Google to be more effective (e.g. Google Scholar has a broader definition of what can be considered a scientific article, also including theses and other forms of reports) (for my search stragey see here). Since more and more data are now being deposited in databases, there are now tools to search these directly (see here for a list of these in ecology/evolution) or can be access directly through R using ropensci packages.
The best format is to store data as plain text, so they can be opened across different systems and softwares (for advice see here). Data should be organised in tables, with clear headers and avoiding duplication. Files can be stored on owncloud or OSF, or for larger collaborative projects Google Sheets is a helpful alternative. I perform all data manipulation in R.
The main consideration for data deposition is re-usability. I try to arrange my data to meet theFAIR principles(findable, accessible, interpretable, reusable). To achieve this, I use a tool developed by NCEAS called Morpho: this forces me to systematically add meta-data, describing how the data were collected and explicitly describing each variable. Using their format ensures that my data can be found using keywords or geographic searches. I deposit my data on the Knowledge Network for Biodiversity. I prefer this over the Max Planck repository Edmond or the European repository Zenodo since these do not have the same metadata facilities and do not (yet) seem linked to broader search engines (such as DataOne or Google Data).