Load a Text File Dataset in ML.NET

Introduction

Machine learning has revolutionized the way we process and analyze data, making it easier to derive valuable insights and predictions. ML.NET, developed by Microsoft, is a powerful and user-friendly framework that allows developers to integrate machine learning into their .NET applications. One of the fundamental tasks in machine learning is loading datasets for model training or analysis. In this blog post, we’ll explore how to load a text file dataset using ML.NET and prepare it for further processing.

The Dataset

Let’s start with a simple dataset stored in a text file named data.txt. The dataset contains two columns: “City” and “Temperature”. Each row corresponds to a city’s name and its respective temperature. Here’s how the data.txt file looks:

City,Temperature 
Rasht,24 
Tehran,28 
Tabriz,8 
Ardabil,4

The Data Transfer Object (DTO)

In ML.NET, we need to create a Data Transfer Object (DTO) that represents the structure of the data we want to load. The DTO is essentially a C# class that matches the schema of our dataset. In our case, we’ll define a DataDto class to represent each row in the data.txt file. Here’s the DataDto.cs file:

public class DataDto
{
    [LoadColumn(0), ColumnName("City")] 
    public string City { get; set; }
    
    [LoadColumn(1), ColumnName("Temperature")]
    public float Temperature { get; set; }
}

The DataDto class has two properties, City and Temperature, which correspond to the columns in the dataset. The properties are decorated with attributes: LoadColumn and ColumnName. The LoadColumn attribute specifies the index of the column from which the property should load its data (0-based index), and the ColumnName attribute assigns the name for the corresponding column in the loaded data.

Loading the Dataset

With the DTO in place, we can now proceed to load the dataset using ML.NET. The entry point for ML.NET operations is the MLContext class. In our Program.cs, we’ll create an instance of MLContext, specify the path to the text file, and load the data into a DataView.

using System;
using Microsoft.ML;

public class Program
{
    static void Main()
    {
        // Create an MLContext
        var mlContext = new MLContext();
        
        // Specify the path to the text file dataset
        string dataPath = "data.txt";
        
        // Load the data from the text file into a DataView using the DataDto class as the schema
        var dataView = mlContext.Data.LoadFromTextFile<DataDto>(dataPath, separatorChar: ',', hasHeader: true);
        
        // Now you can use the dataView for further processing, like training a model, data analysis, etc.
        // ...
    }
}

The LoadFromTextFile method takes the path to the dataset file (dataPath) as well as the separator character (, in our case) and a boolean indicating whether the file has headers (hasHeader: true).

Conclusion

In this blog post, we’ve learned how to load a text file dataset in ML.NET using a Data Transfer Object (DTO) to define the structure of the data. By leveraging the LoadFromTextFile method, we can easily read the dataset into a DataView and utilize it for further processing, such as training a machine learning model or conducting data analysis. ML.NET simplifies the process of integrating machine learning capabilities into .NET applications, making it accessible to a broader range of developers and opening up new possibilities for data-driven solutions.