To everyone who was interested in this proposal, I apologize for the late response.
Due to organizational turmoil within my company, this week has been very hectic.
I currently work for a cybersecurity company in the Web2 sector in Japan. However, the management team has become so preoccupied with the stock price that they have lost sight of their original mission, which is to protect Japan’s Web2 infrastructure. The organization is now in a state where they welcome social turmoil, and the organization is in disarray.
This week I was reminded once again of the value of Astar, one of the few blockchain projects with an admirable philosophy. I sincerely hope that Astar will not lose sight of its original purpose.
Therefore, I was only able to devote two days to Astar this week.
However, we have made progress on the development progress management function using AI, so I would like to share the progress with you.
AI Selection
We investigated DeepSeek (Standard), DeepSeek-VL, DeepSeek-Coder, Phi-3-mini, etc., and will introduce two particularly interesting AIs.
DeepSeek-VL
It has become clear that this AI is necessary to understand frequently appearing diagrams and mathematical formulas within PDF whitepapers.
To run this AI, an AWS g5.xlarge instance is required.
Therefore, the monthly cost amounts to $734.
I believe this is quite expensive for a VPS used solely to retrieve various dApp metrics.
Additionally, after reviewing a number of whitepapers from different dApps, I found that their documentation methods vary widely, and many do not even mention implementation details. This makes them unsuitable for tracking development progress.
As a result, while this AI was intriguing, we did not proceed to the verification stage.
One of the reasons we skipped testing was also the long wait time required to gain GPU access on AWS.
DeepSeek-Coder
This model cannot read PDFs, but it specializes in coding and excels at detailed code analysis that other AIs cannot.
It also does not require a GPU and has been verified to run on an AWS t3.xlarge instance (4 vCPUs/16 GB memory).
So the estimated monthly cost is about $121.47 USD.
However, this approach requires the project to additionally provide a simple functional requirements document.
That said, this is probably a correct and reasonable approach.
PoC
We have deployed DeepSeek-Coder on an AWS t3.xlarge instance.
Next, we named this proposal as a project called AstarWatch and created the following requirements specification document.
# AstarWatch Functional Requirements Specification
## Obtaining dApp Development Progress Status
* Calculate the progress rate by comparing the functional requirements document (spec.md) with the code placed in the code directory.
## Obtaining dApp User Acquisition Status
* Issue an API request to AstarNode to retrieve the number of user wallets associated with the dApp's contract address.
## Obtaining dApp Marketing Metrics
* Retrieve the number of impressions generated by the social media accounts conducting marketing for the dApp.
I wrote the following code to pass a prompt and code to DeepSeek-Coder to check if the features in spec.md are implemented.
import os
import subprocess
import re
# ===== CONFIG =====
LLAMA_CLI_PATH = "./llama.cpp/build/bin/llama-cli"
MODEL_PATH = "./llama.cpp/models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf"
SPEC_PATH = "spec.md"
CODE_DIR = "code"
REPORT_PATH = "report.md"
DEBUG_LOG_PATH = "debug.log"
MAX_TOKENS = 2048
MAX_CODE_CHARS = 4000
# ==================
def parse_spec_with_bullets(path):
with open(path, encoding="utf-8") as f:
lines = f.readlines()
specs = []
current_title = ""
current_body = []
for line in lines:
if line.startswith("## "):
if current_title:
specs.append((current_title, "\n".join(current_body).strip()))
current_title = line.strip().replace("## ", "")
current_body = []
elif line.startswith("* "):
current_body.append(line.strip("* ").strip())
if current_title:
specs.append((current_title, "\n".join(current_body).strip()))
return specs
def load_code_files():
code_map = {}
if os.path.exists(CODE_DIR):
for fname in os.listdir(CODE_DIR):
if fname.endswith((".py", ".ts", ".js", ".sol", ".rs")):
path = os.path.join(CODE_DIR, fname)
with open(path, "r", encoding="utf-8") as f:
code_map[fname] = f.read()[:MAX_CODE_CHARS]
return code_map
def run_llama(prompt):
with open(DEBUG_LOG_PATH, "a", encoding="utf-8") as dbg:
dbg.write("\n\n========== PROMPT ==========\n")
dbg.write(prompt + "\n")
result = subprocess.run([
LLAMA_CLI_PATH,
"-m", MODEL_PATH,
"-p", prompt,
"-n", str(MAX_TOKENS)
], stdout=subprocess.PIPE, text=True)
with open(DEBUG_LOG_PATH, "a", encoding="utf-8") as dbg:
dbg.write("\n\n========== RESPONSE ==========\n")
dbg.write(result.stdout + "\n")
return result.stdout.strip()
def evaluate(specs, code_map):
matched = 0
with open(REPORT_PATH, "w", encoding="utf-8") as rep:
rep.write("# AstarWatch Progress Report (Simplified)\n\n")
for title, detail in specs:
found = False
for fname, code in code_map.items():
prompt = f"""[QUESTION]
Does the code below implement the specification?
[FORMAT]
Answer only on the first line using: 'Answer: Yes' or 'Answer: No'.
[Specification Title]
{title}
[Details]
{detail}
[Code]
{code}
"""
answer = run_llama(prompt)
first_line = next(
(line.strip() for line in answer.splitlines() if line.strip().lower().startswith("answer:")),
"No Answer"
)
if re.match(r"(?i)^Answer:\s*Yes\b", first_line):
matched += 1
rep.write(f"- ✅ {title} → `{fname}`\n")
found = True
break
if not found:
rep.write(f"- ❌ {title}\n")
total = len(specs)
rep.write(f"\n## ✅ Progress Score: {matched}/{total} ({(matched / total) * 100:.1f}%)\n")
def main():
if os.path.exists(DEBUG_LOG_PATH):
os.remove(DEBUG_LOG_PATH)
if os.path.exists(REPORT_PATH):
os.remove(REPORT_PATH)
specs = parse_spec_with_bullets(SPEC_PATH)
code_map = load_code_files()
evaluate(specs, code_map)
if __name__ == "__main__":
main()
When I checked this code (check_progress.py) itself to see if it satisfied the functional requirements (spec.md), I got the following results.
# AstarWatch Progress Report (Simplified)
- ✅ Obtaining dApp Development Progress Status → `check_progress.py`
- ❌ Obtaining dApp User Acquisition Status
- ❌ Obtaining dApp Marketing Metrics
## ✅ Progress Score: 1/3 (33.3%)
Therefore, I believe we have been able to confirm that it works as a minimal PoC.
Issues
- This PoC took 43 minutes to execute, so there are various things to consider when thinking about incorporating it into Astar Portal, but I don’t think it’s a big problem since it’s not a function that requires immediacy.
- Currently, only one program is supported. Unless there is a mechanism to divide it into functions or classes, input it to the AI, and integrate the results, the original purpose cannot be achieved. In addition, doing so will increase the processing time even more.
- In order for this PoC to work properly, the project needs to have spec.md written with a minimum level of granularity, but I am a little worried whether this requirement will be accepted.