Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missleading error message if image in job/task config defined in wrong way #826

Open
YevheniiSemendiak opened this issue Apr 28, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@YevheniiSemendiak
Copy link
Collaborator

Consider live.yml

kind: live

images:
  train:
    ref: image:/$[[ project.owner ]]/$[[ flow.project_id ]]:v1
    dockerfile: $[[ flow.workspace ]]/Dockerfile
    context: $[[ flow.workspace ]]/

jobs:
  test:
    image: ${{ images.train }}
    bash: |
      echo 123

And batch.yml

kind: batch

images:
  train:
    ref: image:test:v1
    dockerfile: $[[ flow.workspace ]]/Dockerfile
    context: $[[ flow.workspace ]]/
    build_preset: cpu-small
tasks:
  - name: test
    image: ${{ images.train }}
    bash: |
      echo 123

When running a test job, one will get confusing error message:

ERROR: Invalid local image 'ImageCtx(id='train', ref='ubuntu', context=PosixPath('/Users/ysem/work/projects/TMP/neuro project'), 
dockerfile=PosixPath('/Users/ysem/work/projects/TMP/neuro project/Dockerfile'), dockerfile_rel=PosixPath('Dockerfile'), build_args=[], 
env={}, volumes=[], build_preset='cpu-small', force_rebuild=False)': invalid image name. Docker specifies it to be the following:
Name components may contain lowercase letters, digits and separators. A separator is defined as a period, one or two underscores, or one or 
more dashes. A name component may not start or end with a separator.

When baking batch, one will get huge, also confusing traceback and error message:

Executor logs
√ Job ID: job-bf024b35-675f-4061-9607-a966eff46e65
- Status: pending Creating
- Status: pending Scheduling
- Status: pending ContainerCreating
√ Status: running
√ Http URL: https://job-bf024b35-675f-4061-9607-a966eff46e65.jobs.onprem-poc.org.neu.ro
√ The job will die in a day. See --life-span option documentation for details.

√ =========== Job is running in terminal mode ===========
√ (If you don't see a command prompt, try pressing enter)
√ (Use Ctrl-P Ctrl-Q key sequence to detach from the job)
[15:47:27] Fetch configs metadata                                               
[15:47:31] Execute attempt #1                                                                                                                
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_sdk/_parsing_utils.py:142 in                     │
│ parse_as_local_image                                                                             │
│                                                                                                  │
│   139 │   def parse_as_local_image(self, image: str) -> LocalImage:                              │
│   140 │   │   try:                                                                               │
│   141 │   │   │   self._validate_image_name(image)                                               │
│ ❱ 142 │   │   │   return self._parse_as_local_image(image)                                       │
│   143 │   │   except ValueError as e:                                                            │
│   144 │   │   │   raise ValueError(f"Invalid local image '{image}': {e}") from e                 │
│   145                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ image = "ImageCtx(id='train', ref='image:test:v1',                                           │ │
│ │         context=URL('storage:.flow/neuro_proje"+217                                          │ │
│ │  self = <neuro_sdk._parsing_utils._ImageNameParser object at 0x7f569f3339d0>                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ /root/.local/lib/python3.10/site-packages/neuro_sdk/_parsing_utils.py:227 in                     │
│ _parse_as_local_image                                                                            │
│                                                                                                  │
│   224 │   def _parse_as_local_image(self, image: str) -> LocalImage:                             │
│   225 │   │   if image.startswith("image:"):                                                     │
│   226 │   │   │   raise ValueError("scheme 'image://' is not allowed for local images")          │
│ ❱ 227 │   │   name, tag = self._split_image_name(image, "latest")                                │
│   228 │   │   return LocalImage(name=name, tag=tag)                                              │
│   229 │                                                                                          │
│   230 │   def _parse_as_neuro_image(                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ image = "ImageCtx(id='train', ref='image:test:v1',                                           │ │
│ │         context=URL('storage:.flow/neuro_proje"+217                                          │ │
│ │  self = <neuro_sdk._parsing_utils._ImageNameParser object at 0x7f569f3339d0>                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_sdk/_parsing_utils.py:321 in _split_image_name   │
│                                                                                                  │
│   318 │   │   │   │   raise ValueError("too many tags")                                          │
│   319 │   │   │   name, tag = image.rsplit(":", 1)                                               │
│   320 │   │   else:                                                                              │
│ ❱ 321 │   │   │   raise ValueError("too many tags")                                              │
│   322 │   │   if "/" in name:                                                                    │
│   323 │   │   │   _, name_no_repo = name.split("/", 1)                                           │
│   324 │   │   else:                                                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ colon_count = 4                                                                              │ │
│ │ default_tag = 'latest'                                                                       │ │
│ │       image = "ImageCtx(id='train', ref='image:test:v1',                                     │ │
│ │               context=URL('storage:.flow/neuro_proje"+217                                    │ │
│ │        self = <neuro_sdk._parsing_utils._ImageNameParser object at 0x7f569f3339d0>           │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: too many tags

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_flow/batch_executor.py:859 in run                │
│                                                                                                  │
│    856 │   async def run(self) -> TaskStatus:                                                    │
│    857 │   │   with self._progress:                                                              │
│    858 │   │   │   try:                                                                          │
│ ❱  859 │   │   │   │   result = await self._run()                                                │
│    860 │   │   │   except (KeyboardInterrupt, asyncio.CancelledError):                           │
│    861 │   │   │   │   await self._cancel_unfinished()                                           │
│    862 │   │   │   │   result = TaskStatus.CANCELLED                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │  exp = ValueError("Invalid local image 'ImageCtx(id='train', ref='image:test:v1',            │ │
│ │        context=URL('storage:.flow/neuro_project/image/test/v1'),                             │ │
│ │        dockerfile=URL('storage:.flow/neuro_project/image/test/v1/Dockerfile'),               │ │
│ │        dockerfile_rel=PurePosixPath('Dockerfile'), build_args=[], env={}, volumes=[],        │ │
│ │        build_preset='cpu-small', force_rebuild=False)': too many tags")                      │ │
│ │ self = <neuro_flow.batch_executor.BatchExecutor object at 0x7f569f3339a0>                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ /root/.local/lib/python3.10/site-packages/neuro_flow/batch_executor.py:928 in _run               │
│                                                                                                  │
│    925 │   │   │   │   elif await flow.is_action(tid):                                           │
│    926 │   │   │   │   │   await self._process_action(full_id)                                   │
│    927 │   │   │   │   else:                                                                     │
│ ❱  928 │   │   │   │   │   await self._process_task(full_id)                                     │
│    929 │   │   │                                                                                 │
│    930 │   │   │   ok = await self._process_started()                                            │
│    931 │   │   │   await self._process_running_builds()                                          │
│                                                                                                  │
│ ╭────────────────────────────────────── locals ──────────────────────────────────────╮           │
│ │ _fmt_debug = <function BatchExecutor._run.<locals>._fmt_debug at 0x7f569f46b010>   │           │
│ │       _mgr = <neuro_flow.batch_executor.BakeTasksManager object at 0x7f569f32e1a0> │           │
│ │       flow = <neuro_flow.context.RunningBatchFlow object at 0x7f569f333880>        │           │
│ │    full_id = ('task-1',)                                                           │           │
│ │     job_id = 'job-bf024b35-675f-4061-9607-a966eff46e65'                            │           │
│ │       meta = TaskMeta(                                                             │           │
│ │              │   enable=True,                                                      │           │
│ │              │   strategy=StrategyCtx(fail_fast=True, max_parallel=10),            │           │
│ │              │   cache=CacheConf(                                                  │           │
│ │              │   │   strategy=<CacheStrategy.DEFAULT: 'default'>,                  │           │
│ │              │   │   life_span=1209600                                             │           │
│ │              │   )                                                                 │           │
│ │              )                                                                     │           │
│ │       self = <neuro_flow.batch_executor.BatchExecutor object at 0x7f569f3339a0>    │           │
│ │        tid = 'task-1'                                                              │           │
│ ╰────────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_flow/batch_executor.py:807 in _process_task      │
│                                                                                                  │
│    804 │   │   │   │   │   )                                                                     │
│    805 │   │                                                                                     │
│    806 │   │   if storage_task is None:                                                          │
│ ❱  807 │   │   │   await self._start_task(full_id, task)                                         │
│    808 │                                                                                         │
│    809 │   async def _load_previous_run(self) -> None:                                           │
│    810 │   │   log.debug(f"BatchExecutor: loading previous run")                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ cache_strategy = <CacheStrategy.DEFAULT: 'default'>                                          │ │
│ │        full_id = ('task-1',)                                                                 │ │
│ │              n = 1                                                                           │ │
│ │           node = ('task-1',)                                                                 │ │
│ │         prefix = ()                                                                          │ │
│ │           self = <neuro_flow.batch_executor.BatchExecutor object at 0x7f569f3339a0>          │ │
│ │   storage_task = None                                                                        │ │
│ │           task = Task(                                                                       │ │
│ │                  │   title=None,                                                             │ │
│ │                  │   name='test',                                                            │ │
│ │                  │   image="ImageCtx(id='train', ref='image:test:v1',                        │ │
│ │                  context=URL('storage:.flow/neuro_proje"+217,                                │ │
│ │                  │   preset=None,                                                            │ │
│ │                  │   schedule_timeout=None,                                                  │ │
│ │                  │   http_port=None,                                                         │ │
│ │                  │   http_auth=None,                                                         │ │
│ │                  │   pass_config=None,                                                       │ │
│ │                  │   entrypoint=None,                                                        │ │
│ │                  │   cmd="bash -euo pipefail -c 'echo 123'",                                 │ │
│ │                  │   workdir=None,                                                           │ │
│ │                  │   volumes=[],                                                             │ │
│ │                  │   life_span=None,                                                         │ │
│ │                  │   env={},                                                                 │ │
│ │                  │   tags={                                                                  │ │
│ │                  │   │   'bake_id:bake-f3f3ef60-c223-4651-bfc6-864563c52b6c',                │ │
│ │                  │   │   'project:neuro-project',                                            │ │
│ │                  │   │   'task:task-1',                                                      │ │
│ │                  │   │   'flow:bake'                                                         │ │
│ │                  │   },                                                                      │ │
│ │                  │   id=None,                                                                │ │
│ │                  │   enable=True,                                                            │ │
│ │                  │   strategy=StrategyCtx(fail_fast=True, max_parallel=10),                  │ │
│ │                  │   cache=CacheConf(                                                        │ │
│ │                  │   │   strategy=<CacheStrategy.DEFAULT: 'default'>,                        │ │
│ │                  │   │   life_span=1209600                                                   │ │
│ │                  │   ),                                                                      │ │
│ │                  │                                                                           │ │
│ │                  caching_key='a34f0ca3947a7478b38d73b67726fb44a622a4799c2f1f990da92be340da0… │ │
│ │                  )                                                                           │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_flow/batch_executor.py:1083 in _start_task       │
│                                                                                                  │
│   1080 │                                                                                         │
│   1081 │   async def _start_task(self, full_id: FullID, task: Task) -> Optional[StorageTask]:    │
│   1082 │   │   log.debug(f"BatchExecutor: checking should we build image for {full_id}")         │
│ ❱ 1083 │   │   remote_image = self._client.parse_remote_image(task.image)                        │
│   1084 │   │   log.debug(f"BatchExecutor: image name is {remote_image}")                         │
│   1085 │   │   if remote_image.cluster_name is None:  # Not a neuro registry image               │
│   1086 │   │   │   return await self._run_task(full_id, task)                                    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ full_id = ('task-1',)                                                                        │ │
│ │    self = <neuro_flow.batch_executor.BatchExecutor object at 0x7f569f3339a0>                 │ │
│ │    task = Task(                                                                              │ │
│ │           │   title=None,                                                                    │ │
│ │           │   name='test',                                                                   │ │
│ │           │   image="ImageCtx(id='train', ref='image:test:v1',                               │ │
│ │           context=URL('storage:.flow/neuro_proje"+217,                                       │ │
│ │           │   preset=None,                                                                   │ │
│ │           │   schedule_timeout=None,                                                         │ │
│ │           │   http_port=None,                                                                │ │
│ │           │   http_auth=None,                                                                │ │
│ │           │   pass_config=None,                                                              │ │
│ │           │   entrypoint=None,                                                               │ │
│ │           │   cmd="bash -euo pipefail -c 'echo 123'",                                        │ │
│ │           │   workdir=None,                                                                  │ │
│ │           │   volumes=[],                                                                    │ │
│ │           │   life_span=None,                                                                │ │
│ │           │   env={},                                                                        │ │
│ │           │   tags={                                                                         │ │
│ │           │   │   'bake_id:bake-f3f3ef60-c223-4651-bfc6-864563c52b6c',                       │ │
│ │           │   │   'project:neuro-project',                                                   │ │
│ │           │   │   'task:task-1',                                                             │ │
│ │           │   │   'flow:bake'                                                                │ │
│ │           │   },                                                                             │ │
│ │           │   id=None,                                                                       │ │
│ │           │   enable=True,                                                                   │ │
│ │           │   strategy=StrategyCtx(fail_fast=True, max_parallel=10),                         │ │
│ │           │   cache=CacheConf(                                                               │ │
│ │           │   │   strategy=<CacheStrategy.DEFAULT: 'default'>,                               │ │
│ │           │   │   life_span=1209600                                                          │ │
│ │           │   ),                                                                             │ │
│ │           │   caching_key='a34f0ca3947a7478b38d73b67726fb44a622a4799c2f1f990da92be340da087f' │ │
│ │           )                                                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_flow/batch_executor.py:439 in parse_remote_image │
│                                                                                                  │
│    436 │   │   return await self._client.images.tag_info(remote_image)                           │
│    437 │                                                                                         │
│    438 │   def parse_remote_image(self, image: str) -> RemoteImage:                              │
│ ❱  439 │   │   return self._client.parse.remote_image(image)                                     │
│    440 │                                                                                         │
│    441 │   @property                                                                             │
│    442 │   def config_presets(self) -> Mapping[str, Preset]:                                     │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ image = "ImageCtx(id='train', ref='image:test:v1',                                           │ │
│ │         context=URL('storage:.flow/neuro_proje"+217                                          │ │
│ │  self = <neuro_flow.batch_executor.RetryReadNeuroClient object at 0x7f569f3335b0>            │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_sdk/_parser.py:181 in remote_image               │
│                                                                                                  │
│   178 │   │   tag_option: TagOption = TagOption.DEFAULT,                                         │
│   179 │   │   cluster_name: Optional[str] = None,                                                │
│   180 │   ) -> RemoteImage:                                                                      │
│ ❱ 181 │   │   return self._get_image_parser(cluster_name).parse_remote(                          │
│   182 │   │   │   image, tag_option=tag_option                                                   │
│   183 │   │   )                                                                                  │
│   184                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ cluster_name = None                                                                          │ │
│ │        image = "ImageCtx(id='train', ref='image:test:v1',                                    │ │
│ │                context=URL('storage:.flow/neuro_proje"+217                                   │ │
│ │         self = <neuro_sdk.Parser object at 0x7f569f332920>                                   │ │
│ │   tag_option = <TagOption.DEFAULT: 3>                                                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_sdk/_parsing_utils.py:168 in parse_remote        │
│                                                                                                  │
│   165 │   │   if value.startswith("image:") or self._find_by_registry(value):                    │
│   166 │   │   │   return self.parse_as_neuro_image(value, tag_option=tag_option)                 │
│   167 │   │                                                                                      │
│ ❱ 168 │   │   img = self.parse_as_local_image(value)                                             │
│   169 │   │   name = img.name                                                                    │
│   170 │   │   registry = None                                                                    │
│   171 │   │   if ":" in name:                                                                    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       self = <neuro_sdk._parsing_utils._ImageNameParser object at 0x7f569f3339d0>            │ │
│ │ tag_option = <TagOption.DEFAULT: 3>                                                          │ │
│ │      value = "ImageCtx(id='train', ref='image:test:v1',                                      │ │
│ │              context=URL('storage:.flow/neuro_proje"+217                                     │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /root/.local/lib/python3.10/site-packages/neuro_sdk/_parsing_utils.py:144 in                     │
│ parse_as_local_image                                                                             │
│                                                                                                  │
│   141 │   │   │   self._validate_image_name(image)                                               │
│   142 │   │   │   return self._parse_as_local_image(image)                                       │
│   143 │   │   except ValueError as e:                                                            │
│ ❱ 144 │   │   │   raise ValueError(f"Invalid local image '{image}': {e}") from e                 │
│   145 │                                                                                          │
│   146 │   def parse_as_neuro_image(                                                              │
│   147 │   │   self, image: str, *, tag_option: TagOption = TagOption.DEFAULT                     │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ image = "ImageCtx(id='train', ref='image:test:v1',                                           │ │
│ │         context=URL('storage:.flow/neuro_proje"+217                                          │ │
│ │  self = <neuro_sdk._parsing_utils._ImageNameParser object at 0x7f569f3339d0>                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Invalid local image 'ImageCtx(id='train', ref='image:test:v1', context=URL('storage:.flow/neuro_project/image/test/v1'), 
dockerfile=URL('storage:.flow/neuro_project/image/test/v1/Dockerfile'), dockerfile_rel=PurePosixPath('Dockerfile'), build_args=[], env={}, 
volumes=[], build_preset='cpu-small', force_rebuild=False)': too many tags
[15:47:41] ERROR: Some unknown error happened. Please report an issue to https://github.com/neuro-inc/neuro-flow/issues/new with traceback   
           printed above.                                                                                                                    
<bake> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 11 sec
                                                            ╭───────────────────╮                                                            
                                                            │ Attempt #1 failed │                                                            
                                                            ╰───────────────────╯                                                            
[15:47:45] Hint: you can restart bake starting from first failed task by the following command:                                              
           neuro-flow restart bake-f3f3ef60-c223-4651-bfc6-864563c52b6c                                                                      

√ Job job-bf024b35-675f-4061-9607-a966eff46e65 stopped

In both cases, the problem is the same: the user forgot to specify the image ref correctly image: ${{ images.train }} -> image: ${{ images.train.ref }}, while the error message is confusing and w.o. syntax knowledge it's hard to guess what is wrong.

We need to add a proper error message in both cases.

@YevheniiSemendiak YevheniiSemendiak added the bug Something isn't working label Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant