• Nenhum resultado encontrado

Management of research data at Porto University: from research needs to curation workflows supported on a data repository

N/A
N/A
Protected

Academic year: 2021

Share "Management of research data at Porto University: from research needs to curation workflows supported on a data repository"

Copied!
51
0
0

Texto

(1)

Management of research data at U.Porto:

from researcher needs to curation workflows

supported on a data repository

Eugénia Matos Fernandes – [email protected] Cristina Ribeiro – [email protected]

João Correia Lopes– [email protected] João Rocha da Silva– [email protected]

RECOLECTA Webinar December 19 2011

(2)

Contents

§  U.Porto: a research university

§  The U.Porto Information System and Institutional Repository §  Scientific Data Curation Project

§  The Data Audit

§  Data Curation Workflow §  Data Repository

(3)
(4)

U.PORTO

Porto Metropolitan Area =

1 000 000 inhabitants

State University created the 22nd

March 1911

Origins date back to the 18thcentury

(5)

U.PORTO :: Geographic distribution

Pole 1 Pole 2 Pole 3

2

1

3

(6)

U.PORTO :: Schools and Research Units

§  Rectorate/Central Services §  14 Schools

Ø  School of Architecture Ø  School of Fine Arts Ø  School of Sciences

Ø  School of Nutrition and Food Science Ø  School of Sport Ø  School of Law Ø  School of Economics Ø  School of Engineering Ø  School of Pharmacy Ø  School of Arts Ø  School of Medicine

Ø  School of Dental Medicine

Ø  School of Psychology and Education Science Ø  Institute of Biomedical Sciences Abel Salazar Ø  Business School

§  ~70 R&D+i units

Ø  31 assessed as excellent or very good

§  30 Libraries + 12 Museums §  Student Support Services

(7)

U.PORTO :: Academic Community

§  Students Ø  30.898 (total) Ø  8% mobility Ø  1st cycle →  9.647 Ø  Integrated Master →  12.758 Ø  Master + 2nd cycle →  5.406 Ø  Specialization →  4258 Ø  PhD + 3rd cycle →  2.828

§  Teachers & researchers

Ø  2.366

→  76% PhD

§  Technical & Administrative staff

(8)

U.Porto :: Teachers & Researchers with PhD

76% of all the academic community

0 10 20 30 40 50 60 70 80 90 100 %

(9)

U.Porto :: Position in international

rankings

International Rankings Portugal 2011 Europe 2011 World 2011 Portuga l 2010 Europe 2010 World 2010

Academic Ranking of World Universities (Shangai Jiao

Tong University) 1 124-164 301-400 1 169-204 401-500

Performance Ranking of Scientific Papers for World

Universities (Taiwan) 1 141 320 1 141 328

Quacquareli Symonds – QS

World University Rankings 2 185-203 401-450 3 - 451-500

Webometrics (CSIC, Madrid) 1 50 178 1 79 230

The Leiden Ranking 1 112 280 1 136 -

SCImago Institutions

Rankings (SIR) 1 77 254 1 90 265

University Ranking by Academic Performance

(10)

U.PORTO: INFORMATION SYSTEM &

INSTITUTIONAL REPOSITORY

(11)
(12)
(13)

U.PORTO :: Institutional Repository

Nov 2007

< 1.000

publications

Nov 2011

+ 18.000 publications

(14)

Publications :: From SIGARRA to the Open Repository

SIGARRA

OPEN REPOSITORY

Migration of full text & open access publications

(15)

U.PORTO :: Open Repository :: 2008-2011

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

(16)

U.Porto :: Scientific Domains

§  EXACT SCIENCES §  NATURAL SCIENCES §  HEALTH SCIENCES §  ENGINEERING AND TECHNOLOGY §  SOCIAL SCIENCES

(17)

U.Porto :: Scientific Domains and Sub-domains

§  EXACT SCIENCES §  Physics §  Mathematics §  Chemistry §  NATURAL SCIENCES

§  Earth and Space Sciences §  Biological Sciences

§  Agricultural Sciences §  …

§  ARTS AND HUMANITIES

§  Literature Studies §  Biological Sciences §  Art Studies

§  ...

§  SOCIAL SCIENCES

§  Economics and Management §  Law and Political Sciences §  Educational Sciences and

Policies §  Communication Sciences §  … §  ENGINEERING AND TECHNOLOGY SCIENCES §  Civil Engineering §  Electrical Engineering §  Informatics §  Mechanical Engineering §  … §  HEALTH SCIENCES

(18)

2012-2015 :: Scientific Data at U.Porto

Full text Open access SIGARRA OPEN REPOSITORY THEMATIC REPOSITORY

SCIENTIFIC DATA REPOSITORY

INSTITUTIONAL REPOSITORY •  Ingest •  Storage •  Preservation •  Access •  Dissemination

(19)

MANAGING RESEARCH DATA

AT U.PORTO

(20)

The “standard” research workflow

Base Data Publication

(21)

However…

(22)

U.PORTO :: Data Curation Initiative

§  Curation of and access to the scientific data generated

by researchers

§  Short study: February 1 to September 30, 2011

§  Expected results

Ø  A first impression on researchers’ needs Ø  Sample of existing curation practices Ø  Sample of existing datasets

Ø  Analysis of datasets from a technical point of view

Ø  Collection of datasets in a standard repository platform Ø  Experimental interrogation of datasets

(23)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

(24)

Evaluating the research data management effort

§ 

Interviewing researchers in several areas

§ 

Collecting data samples

§ 

Documenting use cases for research data

§ 

Identifying data curation practices

§ 

... evaluate resources and select the problems to be

addressed

(25)

Our users, the researchers

§ 

…are not data preservation experts

§ 

...use many document formats

(26)

Address researcher’s needs

§  Repositories cannot be “graveyards for data”, they have to provide effective ways to access the stored data

§  Data has to be well annotated or else cannot be reused (experiment contexts, meanings of variables…)

§  Better ways to find data (e.g. domain-specific restrictions and not just generic metadata)

§  Easy sharing of data (e.g. sending a link to the place where a user can find a specific dataset)

§  Researchers can be cited by their peers through the datasets that they offer

(27)
(28)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 1 :

Interviews

(29)

Interviews :: Nature of data

§ 

Data managed by the researchers

Ø  Personally collected in the context of projects

Ø  Obtained in the context of contracts with external entities Ø  Automatically collected from experimental setups

(30)

Interviews :: Curation Practices

§  Mostly informal

Ø  Researchers keep copies of data in personal machines and additional removable media

Ø  Group leaders keep record of experiments and associate data to published results

Ø  For some non-active data, only paper records exist

§  Exception: ecology group

Ø  Preparing a curation plan in the context of an international project

§  Some data can be re-generated

Ø  Queries to databases of official statistics

§  Some data is processed by specialized software

(31)

Interviews :: Use Cases

§  Publication

Ø  Relation with published material very relevant

§  Re-use within a group

§  Sharing with project partners §  Use in industry

Ø  Data with relevance for economic processes (ex: gravimetry)

Ø  Data collected by industrial partners for contract work (ex: pollutant analysis)

§  Search data

(32)

Interviews :: Metadata

§ 

Mostly inexistent

Ø  Researchers add some annotations for their own use Ø  Dataset-level metadata missing

§ 

Data from interviews (social sciences)

Ø  Some metadata from interview scripts

§ 

Possible source: experimental setup scripts

§ 

Some domains are more advanced

(33)

Data :: Domains and Access Conditions

Domain

Dataset

Access

Astronomy

Gravimetry

Free

Chemical Engineering

Pollutant analysis

Contract pending

Mechanical Engineering

Material fracture

Embargoed

Civil Engineering

High-speed railways

Embargoed

Educational Science

Interviews

Embargoed

Psychology

Interaction records

Embargoed

Economy

Population

Embargoed

Ecology

Plant distribution

Embargoed

(34)

Interviews :: What is left out

§ 

Several interviews revealed complex cases

Ø  Data which resulted from past projects and is no longer used

Ø  Data in non-digital formats

Ø  Data with complex ethics constraints

§ 

Current concern is with data for which the creators are

available and interested in curation

(35)

THE RESEARCH WORKFLOW:

(36)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 2 :

Specify

Workflow

(37)

The role of the “Data Curator”

Data

Curator

Researcher

(38)

Data curation meeting

Meeting

(39)

Annotating data

Silva, João Rocha

Azores GPS Run 01-01-2011 License: CC ShareAlike dc:contributor.author dc:lastModified dc:title dc:rights 38.760267493 -27.084113746 488500.999190 38.760267489 -27.084113743 488499.999191 -27.084113739 488498.999192 38.760267506 488497.999193 38.760267485 -27.084113744 -27.084113730 488496.999194 38.760267507

time.gps_sow latitude longitude

Terceira Flores Table-level metadata Data Dimensions END_METADATA -107.391006 -93.994527 -80.584969 -67.168032 -53.750371 gravity.specific

(40)

After the meeting

Repository

(41)

How other researchers will see it

Explore

Filter

Download just what you need

Researcher

(42)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 3 :

Build tools to

support

the workflow

(43)

UPData Scientific Data Module XSLT Transformer DSpace Core Dynamic Table Formatted Results Query translator XQuery FLWOR Original File Formatted Spreadsheet match 5 4 3 Translated Document (XML) Ingestion page Filtering Query (JSON) XML Manager Results (Data + Metadata) 1 Data Access XLSX Parser 2 Researcher Curator Filtering Request

(44)

Project Phases

Gather Datasets &

Use Cases

Specify Workflow

Build platform

Deposit Datasets

Phase 4 :

Test tool

using

real world

data

(45)

DATA DEPOSIT

(46)

DATA EXPLORING AND DOWNLOAD

- DEMO (VIDEO 2)

(47)

FIND DATASETS

- DEMO (VIDEO 3)

(48)

Data Curation :: Preliminary conclusions

§  Interaction with researchers is crucial

Ø  Data with very different structure, contents and volume

§  Similar use cases in data search

Ø  Suggests models with common search features

§  U.Porto Data Repository

Ø  Project encourages the definition of a data curation policy

§  DSpace has been successfully customized to include Data

Exploration capabilities for tabular data

§  Open Access is not yet an issue

Ø  Project contributes to get researchers confidence on the approach first

(49)

Data Curation :: Validating the prototype

§  Next steps with the researchers

Ø  Presenting their data in the developed repository platform Ø  Evaluating the perceived usefulness of the implemented

features

Ø  Gathering feedback on additional features to be implemented

→ Connecting datasets to their publications?

→ Offering more sophisticated data access controls?

(50)

Data Management :: A Service

§  What does a data management service look like?

Ø  Data curation as an ongoing process and not only at the end Ø  Online documentation to help researchers know what is their

role in the process

Ø  Support/training in the usage of the platform for self-deposit

(51)

Future Work

Ø  Gather feedback on the data repository extension from the group of researchers who have been interviewed

Ø  Additional features of the repository

→  Fine-grained data access control

→  Data dissemination through standard representations (OAIS…)

Ø  Dataset-level metadata

→  DCMI - Science Metadata

Ø  Features of a data management service for U.Porto

→  Require further exploration

Ø  Data management policy for U.Porto

Referências

Documentos relacionados

O objetivo deste trabalho foi o de refinar a primeira versão da escala de adoção de novos produtos baseada no Modelo de Difusão de Inovações, verificando se ela possui

Ligantes como os ácidos D-hidroxicarboxílicos têm a característica de representar um estado intermediário de oxidação e, deste modo, não é incomum que a química do vanadato,

9.605/98, não há mais como se negar a possi- bilidade de imputação de penas às pessoas jurídicas quando comprovada a prática de delitos ambientais ou crimes contra a ordem

De acordo com a análise estatística efetuada, o estudo da correlação entre as variáveis apresenta uma relação moderada e negativa entre a Justiça Organizacional e o

La figura di quest’ultimo, infatti, era stata disciplinata dal D. 277, di attuazione di cinque direttive comunitarie 3 , che ne individuava nozione ed attribuzioni. In base

intensidade na DMO e na massa muscular apendicular. Para este efeito, estudaram 62 jovens caucasianos do sexo masculino entre os 18 e os 25 anos, sendo que 12 eram praticantes

vem sendo adotada por algumas startups, o franqueamento. No tocante à decisão de franqueamento em negócios tradicionais, uma série de estudos foram realizados, sendo duas

Da mesma forma que se colocava a questão de saber se a colação deve funcionar quanto aos bens doados no âmbito da “partilha em vida”, também se questiona se os