This is a repository for conducting interactions with robots using language. Using myCobot, pick and place tasks are achieved through verbal interactions with humans. By utilizing ChatGPT for image interpretation, generating pick and place programs, and creating additional questions for ambiguous instructions, we accomplish the targeted tasks.
For image interpretation, SoM is used. This enables more specific object designation by GPT-4V by using images that overlay numbers on segmented images.
- openai
- openai-whisper
- segment-anything
- pymycobot
sudo apt install portaudio19-dev
git clone https://github.com/neka-nat/mylangrobot.git
cd mylangrobot
pip install -e . # or poetry install
cp .env.sample .env
# Edit .env
# OPENAI_API_KEY=<your api key>
Interactive demonstration with microphone.
sudo chmod 666 /dev/ttyUSBxxx
cd scripts
python demo.py
Demonstration of sending a single text command.
sudo chmod 666 /dev/ttyUSBxxx
cd scripts
python oneshot_demo.py --prompt チョコレートの箱取って。
move_to_object(4)
grab()
move_to_place('drop')
release()
mylangrobot_demo.mp4
You can configure the robot using the configs/settings.yml
file.
Please set the connection port with the robot, the camera ID, the suction pin number, and other hardware-related settings according to your own environment.
pixel_size_on_capture_position: 0.00043 # [m/pixel]
interface_type: "AUDIO"
camera_id: 0
language: "Japanese"
mycobot_settings:
urdf_path: "../data/mycobot/mycobot.urdf"
end_effector_name: "camera_flange"
port: "/dev/ttyACM0"
baud: 115200
default_speed: 40
default_z_speed: 20
suction_pin: 5
command_timeout: 5
use_gravity_compensation: false
end_effector_height: 0.065 # pump head offset
object_height: 0.01
release_height: 0.05
positions:
home: [0, 20, -130, 20, 0, 0]
capture: [0, 0, -30, -60, 0, -45]
drop: [-45, 20, -130, 20, 0, 0]