NumPy example

Description

This example demonstrates how to import external files using the TAR functionality and how to use built-in libraries like NumPy. In this example, the user specifies an input file (defined as an input parameter) that should already be packaged in the fs.tar file. The script will import the NumPy library, open the file, perform some basic operations on the data, and then display an output value.

Click here to see the full list of compatible libraries

Using Truebit to Execute NumPy example

We use Truebit to execute the NumPy task to get a verified result.

The code

You can find the Numpy example within the folder truebit-nextgen-examples/function-tasks/js/numpy-pandas

📄 task.py 📄 create-fs.sh 📄 gendata.py

task.py

task.py
import numpy as np
import os

def main():
	# Load filepath of input CSV
	try:
		with open("input.txt", "r") as file:
		    input_file = file.read().strip()
	except FileNotFoundError:
		print("Error: The file 'input.txt' does not exist.")
		return
	except Exception as e:
		print(f"An error occurred while reading 'input.txt': {e}")
		return

	# Read the input file (assuming it contains CSV data)
	data = []
	try:
		with open(input_file, 'r') as file:
			header = file.readline().strip().split(',')
			for line in file:
				values = line.strip().split(',')
				data.append(values)
	except FileNotFoundError:
		print("Error: The file '{input_file}' does not exist.")
		return
	except Exception as e:
		print(f"An error occurred while reading '{input_file}': {e}")
		return

	# Convert columns to appropriate types
	A = np.array([float(row[0]) for row in data])
	B = np.array([float(row[1]) for row in data])
	C = np.array([row[2] for row in data])

	# Display the original data
	original_output = "Original Data:\n" + str(data)

	# Calculate basic statistics using numpy
	mean_A = np.mean(A)
	std_A = np.std(A)
	sum_B = np.sum(B)
	max_B = np.max(B)

	statistics = (
		f"\nMean of column 'A': {mean_A}\n"
		f"Standard deviation of column 'A': {std_A}\n"
		f"Sum of column 'B': {sum_B}\n"
		f"Maximum value of column 'B': {max_B}"
	)

	# Add a new column with the natural logarithm of 'B' values using numpy
	log_B = np.log(B + 1)  # +1 to avoid log(0)

	# Create a correlation matrix using numpy
	correlation_matrix = np.corrcoef(A, B)
	correlation_output = "Correlation matrix between 'A' and 'B':\n" + str(correlation_matrix)

	# Normalize column 'A' using numpy
	A_normalized = (A - mean_A) / std_A

	# Filter rows where column 'A' is greater than its mean
	filtered_data = np.array([row for row in data if float(row[0]) > mean_A])

	# Add a date column (as a string for simplicity)
	dates = np.arange('2023-01-01', '2024-01-01', dtype='datetime64[D]').astype(str)

	# Group by column 'C' (assuming C is categorical with values 'X', 'Y', 'Z')
	unique_C, indices_C = np.unique(C, return_inverse=True)
	group_means_A = [np.mean(A[indices_C == i]) for i in range(len(unique_C))]
	grouped_output = "Mean of 'A' grouped by 'C':\n" + str(dict(zip(unique_C, group_means_A)))

	# Apply a lambda function to create a new column 'A_category'
	A_category = np.array(['High' if x > 50 else 'Low' for x in A])

	# Handling missing values: introduce some NaN values in 'B' and then fill them
	B_with_nan = B.copy()
	B_with_nan[::3] = np.nan
	B_filled = np.where(np.isnan(B_with_nan), np.nanmean(B_with_nan), B_with_nan)

	# Prepare the output to be written to output.txt
	output_content = (
		original_output + statistics + "\n\n" +
		"Log of B:\n" + str(log_B) + "\n\n" +
		"Normalized A:\n" + str(A_normalized) + "\n\n" +
		"Filtered Data (A > Mean of A):\n" + str(filtered_data) + "\n\n" +
		correlation_output + "\n\n" +
		grouped_output + "\n\n" +
		"A Category:\n" + str(A_category) + "\n\n" +
		"B with NaNs filled:\n" + str(B_filled)
	)

	# Save the results to output.txt
	with open('output.txt', 'w') as file:
		file.write(output_content)

	print(f"Results written to 'output.txt'.")

if __name__ == "__main__":
    main()

In/Out parameters

In order to send parameters to the Truebit task, you need to open the "input.txt" file and get the value from there. For now, only one input parameter is allowed.

with open("input.txt", "r") as file:
input_string = file.read().strip()

In order to retrieve the output value from the Truebit task, you need save it in the "output.txt" file.

with open("output.txt", "w") as file:
file.write(output_string)

create-fs.sh

This example uses external files to import data for processing. The create-fs.sh script is a batch file that runs the tar command to package the files into a single archive. The resulting fs.tar file will be used as input data in the Function Task.

Try out

We will use the Truebit CLI to compile and try our source code. Once the code is finished, we will deploy it to the coordination hub so that everyone who knows the namespace and taskname.

Step 1: Create the NumPy example source code

Within the numpy folder you will find a file called task.py.

Step 2: Build the source code

Execute the build command against Truebit node to get an instrumented Truebit task

truebit build truebit-nextgen-examples/function-tasks/py/numpy-pandas/src -l py

Output

Building the function task
Your function task is ready.
Use the following TaskId for execution or deployment:
py_0135863f3dc093c2c2286fd8584c5d1313fa14480e6de7efa54ca4214647842f

The taskId will always starts with the language prefix + "_" + cID

Step 3: Try the code

Execute the start command against the Truebit node to test our Algorithm. You will need to submit the instrumented task id + the input parameter.

truebit start py_0135863f3dc093c2c2286fd8584c5d1313fa14480e6de7efa54ca4214647842f data/frame1.csv

Output

Executing the function Task
Execution Id: 5f86c995-82a2-467d-aec0-da668ae96b32
Task executed with status: succeed
Task executed with result: Original DataFrame:
 A        B C
17 0.985450 Y
31 0.199326 Y
18 0.432207 Z
69 0.609930 Z
 8 0.193569 Y
58 0.113935 Y
54 0.656381 Y
63 0.269771 Z
91 0.790736 X
35 0.999061 Z
Mean of column 'A': 44.4
Standard deviation of column 'A': 25.377943179067916
Sum of column 'B': 5.250366498443773
Maximum value of column 'B': 0.9990610723341307

Modified DataFrame:
 A        B C    log_B  A_normalized       date A_category  B_filled
17      NaN Y 0.685846     -1.079678 2024-01-01        Low  0.333257
31 0.199326 Y 0.181760     -0.528018 2024-01-02        Low  0.199326
18 0.432207 Z 0.359217     -1.040273 2024-01-03        Low  0.432207
69      NaN Z 0.476191      0.969346 2024-01-04       High  0.333257
 8 0.193569 Y 0.176948     -1.434316 2024-01-05        Low  0.193569
58 0.113935 Y 0.107899      0.535898 2024-01-06       High  0.113935
54      NaN Y 0.504635      0.378281 2024-01-07       High  0.333257
63 0.269771 Z 0.238837      0.732920 2024-01-08       High  0.269771
91 0.790736 X 0.582627      1.836240 2024-01-09       High  0.790736
35      NaN Z 0.692678     -0.370400 2024-01-10        Low  0.333257

Filtered DataFrame (A > Mean of A):
 A        B C    log_B  A_normalized
69 0.609930 Z 0.476191      0.969346
58 0.113935 Y 0.107899      0.535898
54 0.656381 Y 0.504635      0.378281
63 0.269771 Z 0.238837      0.732920
91 0.790736 X 0.582627      1.836240

Correlation matrix between 'A' and 'B':
[[1.         0.09016586]
 [0.09016586 1.        ]]

Mean of 'A' grouped by 'C':
C     A
X 91.00
Y 33.60
Z 46.25
Task resource usage:
┌───────────────┬───────────────┬───────────────┬───────┬─────────────┬──────────┐
│ (index)       │ Unit          │ Limits        │ Peak  │ Last status │ Capacity │
├───────────────┼───────────────┼───────────────┼───────┼─────────────┼──────────┤
│ Gas           │ 'Steps'       │ 1099511627776 │ 'N/A' │ 15073880553 │ 73       │
│ Nested calls  │ 'Calls'       │ 65536         │ 400   │ 0           │ 164      │
│ Frame memory  │ 'Bytes'       │ 524288        │ 5054  │ 0           │ 104      │
│ System memory │ '64 Kb pages' │ 4096          │ 2232  │ '𝚫 1'       │ 2        │
└───────────────┴───────────────┴───────────────┴───────┴─────────────┴──────────┘

Step 4: Deploy the task

Last, but not least, Execute the deploy command to deploy our task to the coordination hub, so that anyone with the namespace, taskname and the API key can execute it.

truebit deploy <namespace> <taskname> --taskId py_0135863f3dc093c2c2286fd8584c5d1313fa14480e6de7efa54ca4214647842f

Output

The function task has been deployed successfully.
The Task <taskName> version <version> was registered successfully in namespace <namespace>

Last updated

Was this helpful?