IBGE I IBGE I Information Technology Directorate Database Department
2. PresentSituation 1 Services Offered
Author:
IBGE'S DATABASE: TECHNOWGICAL CHALLENGES
(TEXTO AINDA SEM REVISÃO DO AUTOR)
Maria
Célia Pelisson Jacon
IBGE
IInformation Technology Directorate
SAS. A user with mainframe access can, today, handle
directly allinformation ofhis interest, without the interference ofDEBAD's technicians.
For application programmers, the integration of
MetadataBase with
ATLASand
CRIPTAcompilers simplifies the
handlingof conventional files.
Atool was developed
which
extracts from
thatbank the description of dictionaries
in aformat
wbich iscompatible with
thatused by those languages.
For the
usersof informa.tion in
SASenvironments, a tool
wasdeveloped for the selection of
variables,interactively
usingthe descriptions ex:isting
inthe
MetadataBase.
Thetool generates all definitions and commands necessary for
bandlingthe microdata
andvalue
· extraction files.
For the Microdata users, the interfàces developed
byDEBAD allow the utilization of
SAS ina batch or on line enviromnent,
extractingvariables
ina very user-fiiendly
manner. The definition of
attributesand their rela.tlonships
isdone smarily
bythe interfàce
ina
transparentmanner for the user, who analyses the
da.tamod_el·and defines
theprocedures for the execution of the
task.For intemal use, including the sectors in charge
witbthe dissemination of
dataat
• mGE,
tools
aremade available for the generation of descriptive catalogs for the
: information
existingin the Metadata Base.
lnan interactive manner, the user
isaided by
menusto obtain the infonnation sought and, in finding
them,generate the catalog.
· 2.2 Probklm EllCOIUdered
Despite
thegood service coverage offered
by D~~'the operational enviromnent impo_ses some restrictions.
Theexclusive uriliution of
proprietarymethods and protocols
restricts the
utfü?.ation of the services to
theiilltemal public and a few
data analysescenters
· connected to
mGE's
SNAnetwork.
The urilization of the data can
also beseen as a
restrictiveelement
againstaccess to information. Excepting for
thesurvey wbich
wasorganized in Theme
DataBases, the incorporation of
datato
th~collection is done according to
thestructure
used inthe survey 'productlon process
andnot according to the needs for dissemination and information
analyses
processes.
. · The
documentation
used forthe implementation of the Metadata
Base was that existing in development mam1als of survey systems, accordingto DEBAD's files.
lnmany
cases,survey
producersdid not make
thenec.essary
reviewfor the validation of
tbisinformation, and have not
yetcomplemented
them with theirsemantic description.
Despite
being a concemof
interna.lbodies, there
isnot
yeta definition
ofpolicies
forthe democratiz.ation of
accessto microdata
which will consider thecommercial
aspectsas well as technical
andlegal issues of the decharacterization of information.
Finally, the technological innovations from graphics techniques and languages for the programming of computer
systems,as
wellas the communications fàcilities
affordedby Internet
must beincorporated to the methods and processes used in
theproduction of our work.
3. The Metainformatiion System
DEBAD
basbeen attempting to
systematizethe
statisticalmetainformation
recording process in
its data managementsystem for over
15years. Initially all efforts were
directed to
theidentification of the constitution elements of statistical metadata and the
definition of operating standards and procedures for
itsacquisition.
As Yves Franchet, General Director ofEUROSTAT [FRAN93], a piece of
information has no meaning in itself Its relevance is restricted to
a specific
context,usually
well known by the information producers, however unknown to the majority of users.3.1 Metadata Base
ln 1986, a database project was initiated, wbich was implemented in 1989 under the name Metadata Base. This base was implemented using the Database Management System, IDMS, basically contemplating statistical data, to the extent that these were the majority of the collection of the database.
The manner of interaction with the
Metada.ta
Base is quite limited.Its interfàce
was built using the technological interfaces which were available at the time, resulting in an interface based on commands executed from rank:ed menus. Two systemsare
made available by the interfàce: an updating system oriented to the maintenance of database infonnation; anda
inquüy system oriented to the users ofIBGE' sSNA
network.The Metadata Base allows the record ot:
• survey
according
to producingareas;
• files produced in the many occurrences of each survey;
• the description of
the
data dictionary ofeach
file;• the classifications and code categories used; and
• the critique
plan
for survey usingCRIPTA
An indexation system for the database variables was developed and partially implemented, ainring at thf" definition of a documentary langnage for the construction
rf
IBGE's thesaurus and tools for inf'ormation search and retrieval.3.2 The MetaBD Project
The Metadata Base ieview work - MetaBD -is underwa.y, within IBGE' s Database Revamping and Reformulation project. The MetaBD wiil add to metadata descriptions of information originating from mGE' s geoscientüic area: geography, territory structures, cartography, geodesy, natural
resources
andenvironmental studies.
Besides, informationare
a1sO
contemplated about IBGE's
products available in other media ( other than magnetic).Thus, MetaBD will reference myriad institutional products such as, for example, statistical yearbooks and maps. MetaBD is being developed in a relational environment, using SGDB ORACLE, in an open platform, in a client-server architecture .. It can be accessed through Internet, adding new user categories to the services provided by DEBAD in
metainformation.
Whereas the pioneer Metadata Base :fàced the challenge of orgmrizing a large data volume, MetaBD prime objective
will
be to make mGE more visible to the public at large.With MetaBD externai users ( experts or otherwise)
may
investigate the availability of data in the collection, from inqujries made in the respective metadata Besides the existence of information, it will be poSSlõle to obtain its semantic description,its
storage structure and the instructions for data extraction from the many dissemination systems and the ana1ysis of the information held by IBGE.3.3 The lntegratol' Database (BDI)
The new IT environment proposed for the organization and access to data at IBGE, to be managed and controlled by DI,
will
be in fàct composed by two main databases: oneMetadata base (MetaBD) and the Integrator Database (BDI). While MetaBD contains
reference and descriptive informationrelating
to data in the collection, BDI will contain allremaining information required for integrated access to the many databases, these being under the control of
aDatabase Management System or otherwise.
3.3.1 Organizing IBGE's Daúl Collection
In
practice, BDI will represem the overall scheme of ffiGE' s
datacollection, today organized in many management
systemsand conventional files in different computing . platforms.
Thisorganiz.ation
isproposed bearing in mind heterogeneous database systems, . today
areality and technological requirement.
· The different
systems and mannersof
dataorganizing
mustcoexist in-the most
integratedmanner, be.canse
that isa
naturalprocess in large organiz.atioils such as
IBGE.These
multiple
systemsand
dataare operated
andmanaged.
underlocal, stand-alone rules,
· with data
models, languages and
manyrepresentations, often for
similarinformation.
Thus,the existence of another leve! of
dataabstraction
isjustifiable, providing access to the
many. conventional systems and files, without need for changes and adaptations to existing
. systems.• 3.J.2 Access
to Data in the CollectionThe many applications wbich require data access may do so using responses to BDI inquiries (and also to MetaBD), wbich will
sel!'Veas
ageneric interface between
application. programs and
the datacollection. This wi1l be valid for applications
requiringmicrodata as
· well
for
aggregateddata
and productsin
generalBDI
will operate as a data integrating. metabase,
actingas the sole pathway for conventional files
andsomehow complementing
the data dictionaries of existing DBMSs.
·,3.3.3 A Uniform YiewofData
lt
is also expected thatBDI
will act as a large data depository, speedingand
fàcilitating accessto them,
independentlyof platfonn
andmodels of origin.
One of the main functions ofBDI wiII be to provide managers, developers
andusers
in general witha uniform view of clifferent
data anddatabases.
3 .4
Outlooks
DEBAD
is passingthrough an
internaidiscussion
process regardingthe organizarion
·of the existing data collection and the incorporation of new
datato the same collection.
A·~
of ffiGE' s Database R.evamping and Reformulation process, it
is expected,among .other
things, that:• the
migration from the
presentto the new situation processes without major
hurdles, because presentsystems and
datacollections
wiJl beheld operational
untilnew releases
are readyto come on line; ·
• new
access interfdces willafford both the dissemination offfiGE' s
productsto the
public atlarge
(Internet) as wellas
agreater integration of those responsible for the
products withthe ones responsiõle for the storage of respective
information and,
still, withthose who will
makeinformation available
(Intranet );• IBGE's product development process be endowed with tools and tecbniques to ensure the consistency of infonnation and make feasible the
intemaland
extemalaccess to the base with a uniform view of
data.The first
phaseof the project - MetaBD definition and implementation -
iscoming to a close, while other access interlà.ces, definition and implementation of
BDIand the
development of application programs
forthe new environment
are beingspecified.
4. lncorporation of Survey to a Shared Information Processing Environment
Theoretically, every piece of information produced
at IBGE isa candidates for incorporation into the Database collection. We understand
that data mustbe incorporated to the collection as.soon as critiqued and
readyfor tabulation work.
At this Stage, the specialistsin
datamanagement, together with those from the production
anddissemination areas, must adopt a data
structurewhich aims at fàcilitating d;ata handling in information analysis and dissemination processes.
A
production
system,such as the
'96Agricultural and Livestock
Census,which considers a shared base
atthe state level,
mayshow a
greaterefficiency in
theproduction processes
andeven for
state analysis.However,
inarder
tobe
usedin
the statescope
andsatisfy the most diverse information demand profiles, databases such as these
must bereviewed,
ainring atfàcilitating
datadissenrination and statistical analysis.
As
in other data incorporation processes, shared bases must a1so be the object of incorporation to the data collection and documeDtation
in lBGE's metainformation
system.DEBAD
must be the bodyresponsible
for themanagement of
datadistribution
andreplication. Together
with theproducing
catddisseminating bodies,
it must beresponsible for
thedefinirion of fragmentation and replication processes necessary to maintain
availability,
integrityand efiiciency
inthe
handlingof
shared databases.5 Utilization of International Standards
The advent of international conmmnication networks, such as
Internet, higblightsthe imponance of
theadoption of intemational standards in order to make feasiõle the
electronic
exchangeofinformation.
The physical feasiõility of these services depends on the
availabilityof the
commnnications media (network)
wbich is beingimplemented
at IBGE,and on an eflicient collection of metadata It
also depeneison the construction of access
interiàces,of manners of
dataorgbz.ation
andsemamfo descriptions wbich are intemationally recogniz.abte. Thus,
even ifthe
usersfàrthest from the information production processes may
accessthe
database for their information
analysis, dataextraction or information exchange.
6. Data
Management PolicyIt
isimportant for
IBGEthat the
datamanagement policy
be decentralized,which does not imply
in sharingMetaBD. The volume of metada.ta does jot
justify sharing,which would add criticai factors such
asthe definition of
sharing criteria, assuranceof consistency of copies, difficulties
incontrolling access,
etc.The Data Management Policy to be adopted must have management support
startingwith senior management, establishing standards and methodologies for the production of
new
data.DEBAD wishes to assume responsibility for the collection, however this wi1l require that survey developing agencies respect incorporation standards and patters,