The NeIC conferences are organised biannually bringing together around 230 experts, researchers, policy makers, funders and national eIntrastructure providers from the Nordics and beyond. The aim is to create an opportunity for people in the eInfrastructure field to connect and collaborate with colleagues across the Nordics and to enable them to share knowledge and expertise. The title for this year’s event is ‘Nordic Infrastructure for Open Science’. We will take a closer look at how we can collaborate across borders to strengthen the field of Open Science. During the three days of the conference we will show examples of Open Science collaborations taking place in the Nordic region, as well as connect what is happening in the Nordics to a larger, international context – in particular the European Open Science Cloud.
With dozens of government agencies and foundations funding research at over 200 universities and hundreds more institutes and businesses, the United States comprises a challenge to comprehensive open science offerings. Approaches by various government agencies to require and incentivize open access to data will be mentioned, as well as platforms and services that enable data sharing and discovery. Models working in the EU, are being replicated in the US and providing a basis for increased awareness and value in open data. The role and challenges of including industry or private research will also be discussed.
(Title given by organisers: Insights on Open Science & EOSC from an RDA perspective)
Within the complex, international landscape of open science and open data, the research data landscape is highly fragmented, by disciplines or by domains. When it comes to cross-disciplinary activities, the notions of "building blocks" of common data infrastructures and building specific "data bridges" are accepted metaphors to approach data complexity and enable data sharing.
The Research Data Alliance (RDA) develops solutions, specifications and best practices enabling data to be shared across barriers through focused Working Groups and Interest Groups, formed of data professionals from all around the world.
This presentation will address where the RDA community stands within the realm of open science and, specifically, the European Open Science Cloud. Does it support or just add another element of complexity?
Observations from the “Internet of things” (IoT), such as intelligent cars, phones, buildings and personal weather stations (PWS), including commodity weather sensors, provide detailed information on local to hyper-local meteorological phenomena. This NordForsk infrastructure project (iOBS) will accommodate an increasing amount and diversity of observation data, and provide a system of harmonised data pooling and merging. The targeted breakthrough and measurable benefit of iOBS is the effective assimilation of diverse observations in regional high-resolution NWP models for the delivery of reliable and accurate weather forecasts and warnings for the benefit of operations, business and society. The basis will be the current operational NWP model, AROME-MetCoOp and/or the very recent addition of a nowcasting suite. At the same time, there is currently a significant and unnecessary diversity at the different National Meteorological Institutes in formats, file structures and (local) software used for observation handling and pre-processing. This fragmented data handling introduces redundancies, errors and missing observations, and the consequence is that valuable information is lost. iOBS wil therefore introduce the Scalable Acquisition and Pre-Processing system (SAPP) for a joint observations handling.
The project will enable use of high-resolution and high-frequency observations. This requires to improve, develop and implement timely quality control (QC) algorithms for a massive amount of private observations of surface pressure. To our knowledge, if successful this will be the first time private pressure observations are assimilated in an operational NWP system.
The observation data flow will be built in parallel on two future generation e-infrastructures: MET Norway's PPI and Glenna-2. PPI provides flexibility, scalability in computing data storage capacity and full end-to-end data integrity to meet modern requirements on data consistency. PPI offers the benefits of both building on existing operational solutions, run as an operational environment and act as a reference to the cloud service. Glenna-2 will make effective use of hybrid environments combining specialized HPC resources and for example container technology with the more flexible cloud delivery model. Having two e-infrastructures solutions offers redundancy and flexibility, addressing the needs and requirements of Nordic (and beyond) research and operations.
The benefits of this 2-year project include:
- Improved NWP forecast quality from increased number of observations
used in data assimilation
- Improved QC algorithms for pre-processing
- Reduced cost for software maintenance and development
- Improved conditions for Nordic research collaboration on both
novel technologies and handling of different observation types
- Knowledge transfer across scientific disciplines and technological
- Redundancy and flexibility by using both a cloud
based research infrastructure (Glenna-2) and a proven operational
- Raise awareness of benefits of public-private
partnerships, e.g. our QC will inform data manufacturers about
their data quality
The project partners are CSC, FMI, MET Norway and SMHI.
You can register for the Conference dinner when you register for the NeIC2019 conference. Registration available here: https://www.deltager.no/neic2019_-_nordic_infrastructure_for_open_science_14052019
The dinner takes place at Gemyse restaurant inside the famous and magical theme park, Tivoli Gardens. To join the dinner you must select attendance when registering to the conference. You can look forward to a delicious three-course gourmet dinner in beautiful surroundings. The required entrace ticket to Tivoli Gardens is then included.
Google maps link
+45 88 70 00 00
Shaping up the Nordics for EOSC is based on the work behind the EOSC-Nordic project proposal coordinated by NeIC, which was submitted to the EC during autumn 2018. Shaping up the Nordics for EOSC aims to facilitate the coordination of EOSC relevant initiatives within the Nordic and Baltic countries and exploit synergies to achieve greater harmonisation at policy and service provisioning across these countries, in compliance with EOSC agreed standards and practices. The project brings together a strong consortium of 24 complementary partners including e-Infrastructure providers, research performing organisations and expert networks, with national mandates and experience with regards to the provision of research data services, and a unique capacity to realise the outcomes of the EOSC design as outlined by the EOSC Implementation Roadmap.
Lifeportal(lifeportal.uio.no/) is a web-based interface developed for researchers who do not have
advanced computer science expertise but need to perform resource-consuming computational
analyses. Lifeportal promotes open science by enabling the users to share and reuse the results
of these analyses, workflows and data among their collaborators or entire workgroups within
one single platform.
Lifeportal is built on the galaxy platform (galaxyproject.org) and it is customized to fit the needs
of the researchers and students. The unique features of the Lifeportal are :
Advances in the development of climate models and associated data viewers and processing tools is achieving unprecedented maturity in the environmental scientific community. This was accompanied by the standardization of model output formats (conventions for Climate and Forecast metadata), the availability of open databases (i.e., the Earth System Grid Federation), and often of the climate model codes themselves.
Applying such FAIR principles virtually makes it possible to re-run climate model runs or undertake other experiments. On the one side such new opportunities should attract interest from other communities such as social and human sciences. On the other side, the climate models, viewers and processing tools are generally far too complex for non-specialists and computationally demanding thus hindering cross-disciplines transfer.
In this presentation we will show how climate models can be run out-of-the-box, without much effort, using an online web platform. We will also show how climate model outputs can be visualized or how deep-learning techniques can be applied using the same web portal.
The Nordic and Baltic genebanks are responsible for conservation of plant genetic resources for food and agriculture. The e-infrastructure used by genebanks is termed Genebank Information Management System (GIMS). Implementation and development of a new Nordic Baltic integrated GIMS with functionalities that allows for incorporation of more data (phenotype/genotype) will be of great benefit for breeders and researchers using plant genetic resources. Efficient use of genetic resources is dependent on an informative database which allows for simple to complex queries, from Boolean searches to more complex queries using combined Boolean searches together with filtering for phenotypic (for example, morphology, disease resistance, yield, quality parameters) and geographic information. In the future there will also be a need to integrate genotypic (genomics) data on the collections. The aims are to fully integrate all information on clonal material from primary collections to clonal archives, develop batch tools for registration of material (including pictures, passport- and phenotype-data), deploy tools to support seed/clone health information (phytosanitary documentation), set up direct links to FAO and ITPGRFA for reporting on PGR, direct export to European (EURISCO) and Global (Genesys PGR and GBIF) databases , provide advanced viewing and filtering methods for phenotypic data, develop capabilities to integrate geographic information, increased ability for Boolean searches across more database tables, prepare for future genotypic (genomic) data on collections, and a “one-stop-shop” for researcher to find and order material from all Nordic-Baltic genebanks.
Jupyter notebooks combine the accessibility of an interactive web-frontend, the reproducibility of a laboratory notebook, and the collaborative potential of a cloud-based deployment. The accessibility and interactivity lowers the barrier for researchers to prototype, write, and share data analysis pipelines, and the literate programming approach of Jupyter makes it particularly simple to reproduce, reuse, and adjust notebooks by colleagues and peers.
Jupyter has another use: providing access to remote resources via JupyterHub. Many typical JupyterHub deployments have used cloud-based resources for one-off purposes, but there is also good support for JupyterHub as an interface to HPC clusters and other pre-existing research facilities. JupyterHub can provide a stepping stone for light computing on existing clusters - as well as a more user friendly interface for preparation and visualization for existing power users.
In this workshop, we will demonstrate the use of JupyterHub and provide guidance so that attendees can set up their own JupyterHub deployments. There will be a show-and-tell of Jupyter itself and existing JupyterHub deployments. We will go over the basic requirements and practical implementation for a JupyterHub setup. The workshop includes discussion about the difference between traditional batch and interactive workloads, and how the parameters of HPC systems can be tuned to interactive uses. At the conclusion of the workshop, participants will be well prepared to begin deployment of JupyterHub to their own facilities and a Nordic JupyterHub community will begin.
Prerequisites: since we do not cover Jupyter itself, we will share links to talks/lessons on basic Jupyter notebooks in an updated abstract so participants can learn and experiment in advance.
CSC has a new cloud platform called Rahti. It is based on OpenShift -
Red Hat's distribution of Kubernetes. It is a generic cloud platform
that is suitable for a wide range of use cases from hosting web sites to
scientific applications. What differentiates it from previous cloud
plaforms such as cPouta is the ease with which applications can be
managed, scaled up and made fault tolerant.
In this session, we will introduce the Rahti platform, tell you how to
get access and show how it can be used with demos of setting up
applications such as Apache Spark and Rocket.Chat.
Many scientific fields are using, or would like to use, personal or sensitive data in the research. Such fields include for example genomics, health, social sciences and language research. The sensitive data that has been cleared for secondary use, should be properly managed and made findable under the same principles than non-sensitive research data. This naturally needs to be done under strict ethical and legal compliance and via secure IT services. However, providing secure e-infrastructure for large cross-border research projects dealing with sensitive data is still in great demand and remains in some extents an unsolved challenge. Moreover, the emphasis on open science and FAIR data by the science communities and policy-makers increase the demand for professional research data management in connection to sensitive data.
This workshop will discuss the current status, opportunities and challenges of secure e-infrastructure services from various angles, and tries to form conclusions as well as inspire action for supporting open science with sensitive data. The topics for the short presentations and panel discussions are selected to highlight different sides of the topic. Topics include for example the following: openness in a sensitive data landscape, experiences from Tryggve project both from service provider and user perspective, secure processing of distributed data, impact of NeIC sensitive data activity, as well as Research Data Management and sensitive data.
The workshop programme includes short talks followed by panel discussion on a few selected topics. We plan to use online tools for one channel for addressing questions to speakers as well as enable online surveys as an option for speakers to interact with the audience.
In modern times, computation power is becoming more and more important. However, at the same time, the rest of the world is becoming consumerized: while the general expectation is that information technology is easier to use, the design of high-performance computing (HPC) systems has not kept up with modern developments in computer usability. There are many historical artifacts of how HPC systems are set up: HPC systems are often optimized for data transfer over scp, while users often prefer solutions where remote drives are mounted. We expect computations to fit into nice “rectangular” boxes of number of cores × time × memory, while with modern data science workflows, the time and memory can be unknown at the start of a job, and, in particular, interactive usage leads to highly intermittent CPU and memory requirements. Why is knowing Linux shell scripting a requirement for every job when we want our facilities to be usable by anyone? How can we empower users to have more control over their software stack?
In this workshop, we will explore the largest usability barriers in HPC systems, existing solutions, and create a joint vision of a modern HPC system. The first talks will be presentations on vision and usability from invited speakers from both HPC and human-computer interaction (HCI). After that, there will be brainstorming sessions (guided, in small groups, unconference, or panel discussions) where we identify the biggest pain points. Then, there will be group discussions in a speed-blogging format to create a shared vision document which will be the result of this workshop. After this workshop, there should be additional Nordic infrastructure cooperation to improve the accessibility, and possibly standardization, of large computational resources beyond those who traditionally use them.
"Homework": This is an interactive workshop, so please come prepared. Talk to people at your institution and/or other meeting at NeIC. Poll the people around you: what are the biggest issues with using your institution's computational facilities? Issues can be both general and specific, e.g. "all files have to manually be transferred, but due to the use of ssh proxy hosts there it is difficult from outside the campus network" or “it is easier to pay Amazon than pay us”.
We, at Uninett and Sigma2, have been working on enabling researcher to utilize machine learning by providing “one click” install to most used deep learning frameworks e.g. Tensorflow, Pytorch with GPUs resources to train the models. This workshop will provide participants a possibility to utilize the given platform.
The workshop will provide introduction to Deep learning with hand-on session to perform image classification using Convolution Neural Network (CNN) in PyTorch framework. We will also go through a Kaggle competition and engage in one competition for image classification task as well. The hands-on tutorial will use Jupyter Notebook running on the above-mentioned infrastructure.
Participants will gain practical experience in Deep Learning from applying state-of-art methods in image classification and ability to transfer this knowledge to their corresponding fields. Some experience with Python and Jupyter notebook can be beneficial to get most out of workshop but no prerequisite is required from Deep learning part.
The provision of research data in accordance with the FAIR principles could be seen as a cornerstone of Open Science and will be a major deliverable of the European Open Science Cloud (EOSC). In this session we will explore various aspects of the implementation of FAIR in the Nordic countries, covering perspectives from the researcher to the policy-maker. Topics can include the roles of the national policies compared to university and funder policies, cooperation between universities and service providers, roles of national and international networks and cooperation, outreach to user communities, scalability and long-term sustainability for FAIR data and other implementation aspects of FAIR.
Linux containers, with the build-once-run-anywhere approach, are becoming popular among scientific communities for software packaging and sharing. Docker is the most popular and user friendly platform for running and managing Linux containers. Singularity is a platform for deploying light-weight containers for HPC systems. Kubernetes is a portable orchestration system for managing containerised workloads. This hands-on tutorial workshop will cover the following: