Gaia Data Release 1 Documentation release 0

Yüklə 5,01 Kb.

Pdf görüntüsü

səhifə	22/125
tarix	02.01.2018
ölçüsü	5,01 Kb.
	#19053

1 ... 18 19 20 21 22 23 24 25 ... 125

• Data nodes: machines providing the bulk of the disk storage needed for the development, testing,
and operational needs of DPCG. A single data node is conﬁgured with a Postgres 9.5 database. It is
planned for the next cycle to use the distributed Postgres-XL on multiple nodes, which will provide
horizontal scalability related to the number of hardware nodes available, while retaining the ease of
use and functionalities of a standard SQL database.
• Processing nodes: these are providing the bulk of CPU processing power for DPCG. They are using
the Sun Grid Engine (SGE) batch system to launch pipeline runs that process sources in parallel on
a high-performance-computing cluster.
• Broker nodes: these are providing middle ware for message exchange between the various parts of
the system. Currently based on a single-node Active MQ server.
• Monitoring nodes: web and application server(s), machines hosting the web and application server
for the DPCG to host part of the visualisation tool and the continuous integration tool.
• O
ﬀ-line backup system: providing backup and recovery functionality of the Postgres database.
• Data exchange node: the front-end machine for data exchange between DPCE and DPCG on which
GTS
/Aspera is installed.
1.3.4.5.2
Gaia DR1
The Integrated Variability Pipeline is built in a modular fashion, and chosen parts of vari-
ability analysis can be included or excluded by editing the conﬁguration ﬁle. For normal operations, all ‘scientiﬁc’
analyses are expected to be executed. However, given the focus on Cepheid and RR Lyrae candidates only a subset
of modules has been included. The Cycle 01 processing has been performed with releases 19.1.x of the following
modules:
• VariDataExchange
• VariConﬁguration
• VariObjectModel
• VariFramework
• VariStatistics
• VariCharacterisation
• VariClassiﬁcation
• VariSpeciﬁcObjects.
During Cycle 01 processing, a speciﬁc subset of sources was processed with the intention of showcasing the quality
of output DPCG and CU7 can produce. However, the end-of-mission number of sources processed is expected to
be many orders of magnitude larger than what is available in Gaia DR1.
For Cycle 01 processing DPCG ingested roughly 3.5 million sources over the whole sky, with associated photomet-
ric data (CU5 output selected to meet a minimum of at least 20 observations per source, as variability processing
has been found to be most reliable when sources have 20 or more observations). DPCG and CU7 then retained
only a few tens of thousands of sources of interest based on their position and likelihood of crossmatch. Of inter-
est were speciﬁcally sources within the Magellanic Cloud(s), and thus an area of interest was deﬁned within 38
degrees of the South ecliptic pole. A ﬁnal list of sources of interest with high probabilities, or external catalogue
crossmatches of the RR Lyrae and Cepheid types, was ﬁnally retained. Further cuts were made by the pipeline’s
analyses, until only 3194 were deemed to have been reliably classiﬁed and (visually) validated.
61

1.3.4.6
DPCT
DPCT operates the AVU pipelines in the Cycle-01 processing, including the daily pipelines of the AVU
/AIM and
AVU
/BAM systems and the data-reduction-cycle pipeline of the GSR system. DPCT processed 800 daily runs of
AIM and BAM and completed the ﬁrst GSR data-reduction processing. The number of work ﬂows executed is
about 100 000, and the number of jobs processed is about 10.5 million. The size of the received input data is about
70 TB while the largest Oracle database at DPCT has a size of 150 TB.
The daily pipelines of the AVU
/AIM and AVU/BAM systems are stable after the relevant upgrade implemented
at the end of the commissioning phase, but they continue to evolve in order to improve their results and add
new modules needed to enrich the analysis. The AVU
/AIM and AVU/BAM software systems have run with
versions 16.0, 17.0, and 18.0 (including a number of patch releases to ﬁx speciﬁc issues found during operations).
The AVU
/AIM pipeline is running with the following modules: Ingestion, Raw Data Processing, Monitoring,
Daily Calibration, Report and Monthly Diagnostics. The AVU
/AIM processing strategy is based on time, with
each AVU
/AIM run being deﬁned on 24 hours of observed data. The AIM pipeline starts with the aim to select
AstroObservation having gclass ≤ 2. The Raw Data Processing processes AstroObservations with gclass equal
to 0, 1, or 2 and estimates the image parameters. In processing Cycle 01, the AVU
/AIM system processed with
a PSF
/LSF bootstrapping library including speciﬁc image proﬁle templates for each CCD, spectral-type bin, and
gclass. The AVU
/AIM system cannot process deﬁned runs when IDT runs in RAW-only mode. The Monitoring
module is a collection of software modules dedicated to extract information on the instrument health, astrometric-
instrument calibration parameters, image quality during in-ﬂight operations, and comparison among AVU
/AIM and
IDT outputs. The Daily Calibration module is devoted to the Gaia signal-proﬁle reconstruction on a daily basis.
Its work ﬂow also includes diagnostics and validation functions. The calibration-related diagnostics include the
image-moment variations over the focal plane. An automatic tool performs validation of the reconstructed image
proﬁles before using them within the AIM chain. The computing performance depends strongly on the number
of AstroObservations characterising the AIM run. The AIM pipeline can manage runs with di
ﬀerent sizes. The
observed range in Cycle-01 processing is between 2 and 11 million AstroObservations. A ﬁlter is activated in runs
with more than 5 million AstroObservations in order to process the minimum number of data in each bin deﬁned
on several instrument and observation parameters and time intervals without losing quality in the AVU
/AIM result.
The ﬁlter is usually activated when Gaia is scanning the galactic plane.
The AVU
/BAM pipeline is running with the following modules: Ingestion, Pre-Processing, RDP, Monitoring,
Weekly Analysis, Calibration, Extraction and Report. In the Raw Data Processing (RDP) module, the following
algorithms are running: Raw Data Processing, Gaiometro, Gaiometro2D, DFT, Chi Square, BAMBin, and com-
parison with IDT BamElementary. The AVU
/BAM system has two run strategies: IDT and H24. In the IDT
strategy, used from commissioning to December 2015 (covering Gaia DR1), a BAM run is deﬁned when a transfer
containing the BAM data is received at DPCT. The processing is started automatically without any check on data.
In the other strategy, the H24 strategy, a BAM run is deﬁned based on 24 hours of data and the processing starts
automatically when the data availability reaches a threshold deﬁned by the BAM payload expert (e.g., 98%–99%).
The AVU
/BAM system has been processing with the H24 strategy since December 2015 to have BAM analyses
at regular intervals. The AVU
/BAM pipeline has been sending the BAM output to the end of each run since the
commissioning phase. After the last operational BAM software release, BAM 19.0.0, the pipeline takes about 60
minutes to execute all modules.
In order to ensure that the automatic data reception and ingestion processes are executed without data losses,
DPCT has implemented and executed a set of procedures to guarantee the consistency of the data inside the DPCT
database. Data-consistency checks are executed on all DPCT data stores and at di
ﬀerent times, e.g., before and
after data are used in the data-reduction pipelines. The DPCT data-consistency checks are working as expected,
i.e., the data-management pipelines are reliable.
The DPCT data stores are in good shape and they are used to provide data services to all AVU data processing and
62

Yüklə 5,01 Kb.

Dostları ilə paylaş:

1 ... 18 19 20 21 22 23 24 25 ... 125