Reader - API Reference¶
warprec.data.reader.base_reader.ReaderFactory
¶
Factory class for creating Reader instances based on configuration.
Source code in warprec/data/reader/base_reader.py
get_reader(config)
classmethod
¶
Factory method to get the appropriate Reader instance based on the configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
WarpRecConfiguration
|
Configuration file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Reader |
Reader
|
An instance of a class that extends the Reader abstract class. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the reading method specified in the configuration is unknown. |
Source code in warprec/data/reader/base_reader.py
warprec.data.reader.local_reader.LocalReader
¶
Bases: Reader
This class extends Reader and handles data reading from a local machine.
Source code in warprec/data/reader/local_reader.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
load_model_state(local_path)
¶
Loads a model state from a given path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str
|
The path to the model state file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The deserialized information of the model (e.g., weights, hyperparameters)
loaded using |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the model state was not found in the provided path. |
Source code in warprec/data/reader/local_reader.py
read_json(*args, **kwargs)
¶
read_json_split(*args, **kwargs)
¶
read_parquet(local_path, column_names=None, *args, **kwargs)
¶
Reads data from a local parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str
|
The local file path to the parquet data. |
required |
column_names
|
Optional[List[str]]
|
A list of specific columns to read. |
None
|
*args
|
Any
|
The additional arguments. |
()
|
**kwargs
|
Any
|
The additional keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame[Any]
|
DataFrame[Any]: A Narwhals DataFrame containing the data. |
Source code in warprec/data/reader/local_reader.py
read_tabular(local_path, column_names=None, dtypes=None, sep='\t', header=True, *args, **kwargs)
¶
Reads tabular data (e.g., CSV, TSV) from a local file.
The file content is read into memory and then processed robustly by the parent's stream processor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str
|
The local file path to the tabular data. |
required |
column_names
|
Optional[List[str]]
|
A list of expected column names. |
None
|
dtypes
|
Optional[Dict[str, str]]
|
A dict of data types corresponding to |
None
|
sep
|
str
|
The delimiter character used in the file. Defaults to tab |
'\t'
|
header
|
bool
|
A boolean indicating if the file has a header row. Defaults to |
True
|
*args
|
Any
|
The additional arguments. |
()
|
**kwargs
|
Any
|
The additional keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame[Any]
|
DataFrame[Any]: A DataFrame containing the tabular data. Returns an empty DataFrame if the blob is not found. |
Source code in warprec/data/reader/local_reader.py
warprec.data.reader.azureblob_reader.AzureBlobReader
¶
Bases: Reader
This class extends Reader and handles data reading from an Azure Blob Storage container.
It uses DefaultAzureCredential to authenticate, which relies on environment variables or other standard Azure identity sources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
storage_account_name
|
str
|
The name of the Azure Storage Account. |
required |
container_name
|
str
|
The name of the container where data is stored. |
required |
backend
|
str
|
The backend to use for reading data. |
'polars'
|
Source code in warprec/data/reader/azureblob_reader.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
load_model_state(blob_name)
¶
This method will load a model state from a source.
Downloads the blob content as bytes and uses joblib.load to deserialize the model state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
The path/name of the blob containing the serialized model state (e.g., a |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A dictionary representing the loaded model state (e.g., weights, hyperparameters). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the model state blob is not found. |
Source code in warprec/data/reader/azureblob_reader.py
read_json(*args, **kwargs)
¶
read_json_split(*args, **kwargs)
¶
read_parquet(blob_name, column_names=None, *args, **kwargs)
¶
Reads parquet data from a blob.
Downloads the blob content as bytes and uses the inherited _process_parquet_data
for parsing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
The path/name of the blob containing the parquet data. |
required |
column_names
|
Optional[List[str]]
|
A list of specific columns to read. |
None
|
*args
|
Any
|
The additional arguments. |
()
|
**kwargs
|
Any
|
The additional keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame[Any]
|
DataFrame[Any]: A Narwhals DataFrame containing the data. |
Source code in warprec/data/reader/azureblob_reader.py
read_tabular(blob_name, column_names, dtypes, sep='\t', header=True, *args, **kwargs)
¶
Reads tabular data from a blob by feeding it to the parent stream processor.
Downloads the blob content as a string and uses the inherited _process_tabular_stream
for robust tabular parsing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
The path/name of the blob containing the tabular data. |
required |
column_names
|
Optional[List[str]]
|
A list of expected column names. |
required |
dtypes
|
Optional[Dict[str, str]]
|
A dict of data types corresponding to |
required |
sep
|
str
|
The delimiter character used in the file. Defaults to tab |
'\t'
|
header
|
bool
|
A boolean indicating if the file has a header row. Defaults to |
True
|
*args
|
Any
|
The additional arguments. |
()
|
**kwargs
|
Any
|
The additional keyword arguments. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
DataFrame
|
A Pandas DataFrame containing the tabular data. Returns an empty DataFrame if the blob is not found. |