• Nenhum resultado encontrado

Management of research data at Porto University: from research needs to curation workflows supported on a data repository

N/A
N/A
Protected

Academic year: 2021

Share "Management of research data at Porto University: from research needs to curation workflows supported on a data repository"

Copied!
51
0
0

Texto

(1)

Management of research data at U.Porto:

from researcher needs to curation workflows

supported on a data repository

Eugénia Matos Fernandes – efernand@reit.up.pt Cristina Ribeiro – mcr@fe.up.pt

João Correia Lopes– jlopes@fe.up.pt João Rocha da Silva– joaorosilva@gmail.com

RECOLECTA Webinar December 19 2011

(2)

Contents

§  U.Porto: a research university

§  The U.Porto Information System and Institutional Repository §  Scientific Data Curation Project

§  The Data Audit

§  Data Curation Workflow §  Data Repository

(3)
(4)

U.PORTO

Porto Metropolitan Area =

1 000 000 inhabitants

State University created the 22nd

March 1911

Origins date back to the 18thcentury

(5)

U.PORTO :: Geographic distribution

Pole 1 Pole 2 Pole 3

2

1

3

(6)

U.PORTO :: Schools and Research Units

§  Rectorate/Central Services §  14 Schools

Ø  School of Architecture Ø  School of Fine Arts Ø  School of Sciences

Ø  School of Nutrition and Food Science Ø  School of Sport Ø  School of Law Ø  School of Economics Ø  School of Engineering Ø  School of Pharmacy Ø  School of Arts Ø  School of Medicine

Ø  School of Dental Medicine

Ø  School of Psychology and Education Science Ø  Institute of Biomedical Sciences Abel Salazar Ø  Business School

§  ~70 R&D+i units

Ø  31 assessed as excellent or very good

§  30 Libraries + 12 Museums §  Student Support Services

(7)

U.PORTO :: Academic Community

§  Students Ø  30.898 (total) Ø  8% mobility Ø  1st cycle →  9.647 Ø  Integrated Master →  12.758 Ø  Master + 2nd cycle →  5.406 Ø  Specialization →  4258 Ø  PhD + 3rd cycle →  2.828

§  Teachers & researchers

Ø  2.366

→  76% PhD

§  Technical & Administrative staff

(8)

U.Porto :: Teachers & Researchers with PhD

76% of all the academic community

0 10 20 30 40 50 60 70 80 90 100 %

(9)

U.Porto :: Position in international

rankings

International Rankings Portugal 2011 Europe 2011 World 2011 Portuga l 2010 Europe 2010 World 2010

Academic Ranking of World Universities (Shangai Jiao

Tong University) 1 124-164 301-400 1 169-204 401-500

Performance Ranking of Scientific Papers for World

Universities (Taiwan) 1 141 320 1 141 328

Quacquareli Symonds – QS

World University Rankings 2 185-203 401-450 3 - 451-500

Webometrics (CSIC, Madrid) 1 50 178 1 79 230

The Leiden Ranking 1 112 280 1 136 -

SCImago Institutions

Rankings (SIR) 1 77 254 1 90 265

University Ranking by Academic Performance

(10)

U.PORTO: INFORMATION SYSTEM &

INSTITUTIONAL REPOSITORY

(11)
(12)
(13)

U.PORTO :: Institutional Repository

Nov 2007

< 1.000

publications

Nov 2011

+ 18.000 publications

(14)

Publications :: From SIGARRA to the Open Repository

SIGARRA

OPEN REPOSITORY

Migration of full text & open access publications

(15)

U.PORTO :: Open Repository :: 2008-2011

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

(16)

U.Porto :: Scientific Domains

§  EXACT SCIENCES §  NATURAL SCIENCES §  HEALTH SCIENCES §  ENGINEERING AND TECHNOLOGY §  SOCIAL SCIENCES

(17)

U.Porto :: Scientific Domains and Sub-domains

§  EXACT SCIENCES §  Physics §  Mathematics §  Chemistry §  NATURAL SCIENCES

§  Earth and Space Sciences §  Biological Sciences

§  Agricultural Sciences §  …

§  ARTS AND HUMANITIES

§  Literature Studies §  Biological Sciences §  Art Studies

§  ...

§  SOCIAL SCIENCES

§  Economics and Management §  Law and Political Sciences §  Educational Sciences and

Policies §  Communication Sciences §  … §  ENGINEERING AND TECHNOLOGY SCIENCES §  Civil Engineering §  Electrical Engineering §  Informatics §  Mechanical Engineering §  … §  HEALTH SCIENCES

(18)

2012-2015 :: Scientific Data at U.Porto

Full text Open access SIGARRA OPEN REPOSITORY THEMATIC REPOSITORY

SCIENTIFIC DATA REPOSITORY

INSTITUTIONAL REPOSITORY •  Ingest •  Storage •  Preservation •  Access •  Dissemination

(19)

MANAGING RESEARCH DATA

AT U.PORTO

(20)

The “standard” research workflow

Base Data Publication

(21)

However…

(22)

U.PORTO :: Data Curation Initiative

§  Curation of and access to the scientific data generated

by researchers

§  Short study: February 1 to September 30, 2011

§  Expected results

Ø  A first impression on researchers’ needs Ø  Sample of existing curation practices Ø  Sample of existing datasets

Ø  Analysis of datasets from a technical point of view

Ø  Collection of datasets in a standard repository platform Ø  Experimental interrogation of datasets

(23)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

(24)

Evaluating the research data management effort

§ 

Interviewing researchers in several areas

§ 

Collecting data samples

§ 

Documenting use cases for research data

§ 

Identifying data curation practices

§ 

... evaluate resources and select the problems to be

addressed

(25)

Our users, the researchers

§ 

…are not data preservation experts

§ 

...use many document formats

(26)

Address researcher’s needs

§  Repositories cannot be “graveyards for data”, they have to provide effective ways to access the stored data

§  Data has to be well annotated or else cannot be reused (experiment contexts, meanings of variables…)

§  Better ways to find data (e.g. domain-specific restrictions and not just generic metadata)

§  Easy sharing of data (e.g. sending a link to the place where a user can find a specific dataset)

§  Researchers can be cited by their peers through the datasets that they offer

(27)
(28)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 1 :

Interviews

(29)

Interviews :: Nature of data

§ 

Data managed by the researchers

Ø  Personally collected in the context of projects

Ø  Obtained in the context of contracts with external entities Ø  Automatically collected from experimental setups

(30)

Interviews :: Curation Practices

§  Mostly informal

Ø  Researchers keep copies of data in personal machines and additional removable media

Ø  Group leaders keep record of experiments and associate data to published results

Ø  For some non-active data, only paper records exist

§  Exception: ecology group

Ø  Preparing a curation plan in the context of an international project

§  Some data can be re-generated

Ø  Queries to databases of official statistics

§  Some data is processed by specialized software

(31)

Interviews :: Use Cases

§  Publication

Ø  Relation with published material very relevant

§  Re-use within a group

§  Sharing with project partners §  Use in industry

Ø  Data with relevance for economic processes (ex: gravimetry)

Ø  Data collected by industrial partners for contract work (ex: pollutant analysis)

§  Search data

(32)

Interviews :: Metadata

§ 

Mostly inexistent

Ø  Researchers add some annotations for their own use Ø  Dataset-level metadata missing

§ 

Data from interviews (social sciences)

Ø  Some metadata from interview scripts

§ 

Possible source: experimental setup scripts

§ 

Some domains are more advanced

(33)

Data :: Domains and Access Conditions

Domain

Dataset

Access

Astronomy

Gravimetry

Free

Chemical Engineering

Pollutant analysis

Contract pending

Mechanical Engineering

Material fracture

Embargoed

Civil Engineering

High-speed railways

Embargoed

Educational Science

Interviews

Embargoed

Psychology

Interaction records

Embargoed

Economy

Population

Embargoed

Ecology

Plant distribution

Embargoed

(34)

Interviews :: What is left out

§ 

Several interviews revealed complex cases

Ø  Data which resulted from past projects and is no longer used

Ø  Data in non-digital formats

Ø  Data with complex ethics constraints

§ 

Current concern is with data for which the creators are

available and interested in curation

(35)

THE RESEARCH WORKFLOW:

(36)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 2 :

Specify

Workflow

(37)

The role of the “Data Curator”

Data

Curator

Researcher

(38)

Data curation meeting

Meeting

(39)

Annotating data

Silva, João Rocha

Azores GPS Run 01-01-2011 License: CC ShareAlike dc:contributor.author dc:lastModified dc:title dc:rights 38.760267493 -27.084113746 488500.999190 38.760267489 -27.084113743 488499.999191 -27.084113739 488498.999192 38.760267506 488497.999193 38.760267485 -27.084113744 -27.084113730 488496.999194 38.760267507

time.gps_sow latitude longitude

Terceira Flores Table-level metadata Data Dimensions END_METADATA -107.391006 -93.994527 -80.584969 -67.168032 -53.750371 gravity.specific

(40)

After the meeting

Repository

(41)

How other researchers will see it

Explore

Filter

Download just what you need

Researcher

(42)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 3 :

Build tools to

support

the workflow

(43)

UPData Scientific Data Module XSLT Transformer DSpace Core Dynamic Table Formatted Results Query translator XQuery FLWOR Original File Formatted Spreadsheet match 5 4 3 Translated Document (XML) Ingestion page Filtering Query (JSON) XML Manager Results (Data + Metadata) 1 Data Access XLSX Parser 2 Researcher Curator Filtering Request

(44)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 4 :

Test tool

using

real world

data

(45)

DATA DEPOSIT

(46)

DATA EXPLORING AND DOWNLOAD

- DEMO (VIDEO 2)

(47)

FIND DATASETS

- DEMO (VIDEO 3)

(48)

Data Curation :: Preliminary conclusions

§  Interaction with researchers is crucial

Ø  Data with very different structure, contents and volume

§  Similar use cases in data search

Ø  Suggests models with common search features

§  U.Porto Data Repository

Ø  Project encourages the definition of a data curation policy

§  DSpace has been successfully customized to include Data

Exploration capabilities for tabular data

§  Open Access is not yet an issue

Ø  Project contributes to get researchers confidence on the approach first

(49)

Data Curation :: Validating the prototype

§  Next steps with the researchers

Ø  Presenting their data in the developed repository platform Ø  Evaluating the perceived usefulness of the implemented

features

Ø  Gathering feedback on additional features to be implemented

→ Connecting datasets to their publications?

→ Offering more sophisticated data access controls?

(50)

Data Management :: A Service

§  What does a data management service look like?

Ø  Data curation as an ongoing process and not only at the end Ø  Online documentation to help researchers know what is their

role in the process

Ø  Support/training in the usage of the platform for self-deposit

(51)

Future Work

Ø  Gather feedback on the data repository extension from the group of researchers who have been interviewed

Ø  Additional features of the repository

→  Fine-grained data access control

→  Data dissemination through standard representations (OAIS…)

Ø  Dataset-level metadata

→  DCMI - Science Metadata

Ø  Features of a data management service for U.Porto

→  Require further exploration

Ø  Data management policy for U.Porto

Referências

Documentos relacionados

O objetivo deste trabalho foi o de refinar a primeira versão da escala de adoção de novos produtos baseada no Modelo de Difusão de Inovações, verificando se ela possui

Da mesma forma que se colocava a questão de saber se a colação deve funcionar quanto aos bens doados no âmbito da “partilha em vida”, também se questiona se os

Ligantes como os ácidos D-hidroxicarboxílicos têm a característica de representar um estado intermediário de oxidação e, deste modo, não é incomum que a química do vanadato,

9.605/98, não há mais como se negar a possi- bilidade de imputação de penas às pessoas jurídicas quando comprovada a prática de delitos ambientais ou crimes contra a ordem

De acordo com a análise estatística efetuada, o estudo da correlação entre as variáveis apresenta uma relação moderada e negativa entre a Justiça Organizacional e o

intensidade na DMO e na massa muscular apendicular. Para este efeito, estudaram 62 jovens caucasianos do sexo masculino entre os 18 e os 25 anos, sendo que 12 eram praticantes

vem sendo adotada por algumas startups, o franqueamento. No tocante à decisão de franqueamento em negócios tradicionais, uma série de estudos foram realizados, sendo duas

La figura di quest’ultimo, infatti, era stata disciplinata dal D. 277, di attuazione di cinque direttive comunitarie 3 , che ne individuava nozione ed attribuzioni. In base