Skip to content

When using ArduCopter settings simulation hangs #2594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZoguBK opened this issue Apr 16, 2020 · 13 comments
Open

When using ArduCopter settings simulation hangs #2594

ZoguBK opened this issue Apr 16, 2020 · 13 comments

Comments

@ZoguBK
Copy link

ZoguBK commented Apr 16, 2020

Blocks project when used with folowing settings:
{
"SeeDocsAt": "https://github.com/Microsoft/AirSim/blob/master/docs/settings.md",
"SettingsVersion": 1.2,
"SimMode": "Multirotor",
"Vehicles": {
"Copter": {
"VehicleType": "ArduCopter",
"UseSerial": false,
"DefaultVehicleState": "Disarmed",
"UdpIp": "127.0.0.1",
"UdpPort": 9003,
"SitlPort": 9002
}
}
}

Then when I start the simulation i just hangs. It never exits while loop.
This is the while loop

while (recv_ret != sizeof(pkt)) {
if (recv_ret <= 0) {
Utils::log(Utils::stringf("Error while receiving rotor control data - ErrorNo: %d", recv_ret), Utils::kLogLevelInfo);
} else {
Utils::log(Utils::stringf("Received %d bytes instead of %zu bytes", recv_ret, sizeof(pkt)), Utils::kLogLevelInfo);
}
recv_ret = udpSocket_->recv(&pkt, sizeof(pkt), 100);
}

@rajat2004
Copy link
Contributor

Yes, the simulation will hang if ArduPilot SITL isn't launched. This is an effect of Lockstep scheduling which is how the ArduPilot support was designed, it waits for the other side to respond before advancing the time and physics. This enables constant timing steps and allows attaching a debugger to either AP or even AirSim without it affecting the physics. Since the physics runs at a constant frame rate, even if the computer is slow, the rendering will be affected, not the physics

@ZoguBK
Copy link
Author

ZoguBK commented Apr 24, 2020

@rajat2004 thanks for the explanation.
The other issue i have is that since my sim has two levels and when i want to change the level during play with Unreal OpenLevel node it also hangs.
Even if it started just fine and everything was working in the first level. I guess it loses connection.
What can i do in this situation?

@rajat2004
Copy link
Contributor

I haven't ever worked with 2 levels so o idea as to how it behaves in that situation. When changing the levels, does the simulation stop, as in does AirSim get stopped and when the level change is finished, it starts again or do you have to manually start it? Does the physics from the previous level carry over to the next level?
If there's just a pause in the simulation and it starts back again, I think it should work, it can reconnect when UE finishes loading the level. Any error message about the connection?

@mhl787156
Copy link

Yes, the simulation will hang if ArduPilot SITL isn't launched. This is an effect of Lockstep scheduling which is how the ArduPilot support was designed, it waits for the other side to respond before advancing the time and physics. This enables constant timing steps and allows attaching a debugger to either AP or even AirSim without it affecting the physics. Since the physics runs at a constant frame rate, even if the computer is slow, the rendering will be affected, not the physics

Just to continue this question - I am running AirSim on windows on one machine, and have arducopter running on a separate laptop. I have set localhostip to 0.0.0.0 and the udpip to the ip of the laptop. The laptop is running the sim_vehicle command with '-A --sim-address=<ip address of windows machine'

This all runs fine, but AirSim is running incredibly slowly - one frame every few minutes! Checking the logs I also get many lines of 'LogTemp: Error while receiving rotor control data - ErrorNo: -1' I guess for the same reason as OP. Ardupilot on the other hand is constantly complaining of 'No sensor message received - Bad file descripter'

Is this due to the Lock-Step implementation where AirSim is waiting on Ardupilot? But it seems that Ardupilot is also waiting on AirSIm! Is this possibly due to a bad connection or something else?

Many Thanks (Let me know if I should raise a new issue instead of commenting here!)

@rajat2004
Copy link
Contributor

Seems like a bad connection from ArduPilot side, bad file descriptor generally means it failed to create the socket. Is the other laptop running Linux, or is it Windows with WSL? You probably would have already checked if the machine is reachable or not.

About running slow, yes, that's due to Lock-Step implementation, since messages aren't being received, it's not proceeding further. Is it actually running very slow, or just completely stuck? The second one would make more sense, the fiirst one means that there's some different issue, since some messages are getting through.

One option you could try first is to use WSL on your Windows machine to make sure things are working. With WSL 1, you don't need to set the IP addresses and all, just the default one works since Win & WSL are on same subnets. With WSL 2, they are on different networks, so it's more like a full-fledged separate machine.

I've tested AirSim & AP running on 2 separate Linux machines, and on Win with both WSL 1,2, and it all works according to the docs. I'll see if I can try running AirSim on Win and AP on a VM to test more thoroughly, but will take some time.

Also try disabling the Windows Firewall, or add a rule for UE4 or the .exe file, see https://discuss.ardupilot.org/t/gsoc-2019-airsim-simulator-support-for-ardupilot-sitl-part-ii/46395/5?u=rajat2004

@mhl787156
Copy link

Thanks for your reply! So the setup involves a linux laptop running ardupilot connecting to a Windows machine running AirSim over the local network. I think you are right in saying it might be the Ardupilot failing at something as running airsim+ardupilot on WSL1 on windows runs just fine!

Interesting, so the frame/physics rate is tied to the connection latency between the machines. Interesting it just runs very very slow, at a non constant rate. I first start Airsim, then run_vehicles.py. About a minute in, the drone model shows up in Airsim but freezes. After attempting to mode guidance + arm + takeoff, AirSim may or may not update to a frame of the drone beginning to take off and maybe another in the air. It appears to be different everytime - sometimes it freezes when the drone appears, sometimes it gives me about 10 intermittent frames and I give up after 20mins! I'ts confusing as so very few seem to get through - you would expect either most or none at all!

Yes, I've turned off windows firewall for private connections, also (hopefully) let it through ufw on linux as well!

Thanks, I'll do some more testing too with the configurations of sim_vehicle and mavproxy. I may also try wiring my laptop directly to the deskop and seeing what happens! I'll update here if I find anything.

@rajat2004
Copy link
Contributor

Since some of the messages are going through (which is strange), maybe it's running loop where both are waiting for each other as you mentioned. Could you try a fix I've added here - ArduPilot/ardupilot@master...rajat2004:airsim-resend
This resends servo output from AP side after a 1s timeout (might be too high)
I think if there are timeouts happening, etc. then some proper handling of the simulation time being received might be required

AirSim uses a separate Physics loop for drones, which is not tied to the graphics rate. So the Physics runs at 333Hz by default, and the graphics run at the speed depending on GPU, etc. That's useful for running on slower machines and allows debugging on either side. Note that this is different than Car vehicle which is tied to PhysX running in UE, and thus causes variable rate problems when using with AP Rover

Connecting directly using maybe Ethernet might be a good option

@mhl787156
Copy link

I have tried your fix (double check building: "./waf configure --board sitl" then "./waf copter" to build the changes?). So on the AirSim side things look exactly the same unfortunately - drone appears and then after some time the rendering updates - after a while we still get "LogTemp: Error while receiving rotor control data - ErrorNo: -1". On Ardupilot, the copter console now mostly no longer continuously complains about no sensor message received (just displays FPS) (I got the resending servos message only once in 5 or so mins). However, once I close AirSim on windows with Ardupilot running, ArduCopter Console now continuously prints "No sensor message received in last 1s, error - Bad file descriptor, resending servos" as expected I suppose.

Interestingly, I hooked up wireshark to monitor packets, and when both Ardupilot and AirSim are running, they appear to be in near constant communication! Airsim is sending a packet which includes acceleration, long, lat etc and Ardupilot is sending some packet too (no characters). However they are sending the same packets basically forever. Unsure if they are response packets or meerly hearbeat. Maybe this can give some insight into something?

I suppose some broken communication is occuring, but either side cannot read each other's communications?

@rajat2004
Copy link
Contributor

rajat2004 commented Jul 16, 2020

Commands are correct, and the console output also seems correct. Could you post the Wireshark capture of the packets here?
Looks like pretty bad intermittent connection problems are occurring, but it the messages are coming continuously, then that's strange. I don't remember any specific cases where the messages are being discarded, but will have to take at the code again.

Could you also try commenting out https://github.com/ArduPilot/ardupilot/blob/master/libraries/SITL/SIM_AirSim.cpp#L311 so that some extra info is logged on AP side, and post that log as well, might be helpful. Thanks!

And yeah, the packets sent by AP don't have any chars, they just contain direct array values. It didn't seem useful to have JSON format there, since it would require complicated parsing on AirSim side as well. There aren't any heartbeat packets, and UDP doesn't send any response, so only packets would be Airsim's sensor and AP's servo outputs.

@mhl787156
Copy link

@rajat2004 just testing now, how do I see the logs after commenting out that line?

@rajat2004
Copy link
Contributor

Logs are stored in a folder named logs, with LASTLOG.txt having the last saved log file number. If running sim_vehicle.py from ArduCopter directory, then logs folder will be in that directory, else if from the main AP repo folder, then it's present in there. You can also see when the file was created to make sure that it's the correct one
This page - https://ardupilot.org/dev/docs/using-mavexplorer-for-log-analysis.html, gives some details on how to analyse logs using MAVExplorer if interested, Mission Planner can also be used. I'm pretty basic at analyzing logs, but they can give certain info such as time jump, or others as well.

@mhl787156
Copy link

Whilst I work out how to view logs - here are the wireshark snippets below. ipaddr.17 is the airsim machine, and ipaddr.63 is the linux ardupilot machine. All the ports line up i think! Let me know if you want any more info

Here we see the constant communication
wiresharkmain

Here is the packet from airsim to ardupilot
wireshark_airsim_comms

Here is the packet from ardupilot to airsim
wireshark_adupilot_comms

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had activity from the community in the last year. It will be closed if no further activity occurs within 20 days.

@stale stale bot added the stale label Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants