ERDC Library Menu

Redirecting...

Contact

erdclibrary@ask-a-librarian.info

601.501.7632 - text
601.634.2355 - voice

 

Search the ERDC Library

About the Library

The ERDC Library supports the mission-related research needs of ERDC scientists and engineers at three physical locations with a centralized library catalog and web site. It also hosts an online digital repository of ERDC-authored reports.

The ERDC Library collection is available for interlibrary loan. Please contact your local library for all interlibrary loan requests. Other requests should be directed to the reference staff.

Additionally the library provides access to:

  • 300,000+ items in the collection - 28,000+ online journals - 34,000+ online books & reports
  • Online research resources including IEEE, Science Direct, Web of Science, RefWorks
  • Collection development and interlibrary loan services
  • Research consultations, training, and outreach services
  • Support for copyright questions and support for research and administrative initiatives

Social Media

Publication Notices

Results:
Tag: Datasets
Clear
  • Data Lake Ecosystem Workflow

    Abstract: The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer be-tween government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.